Skip to content

mindocr

mindocr.data

mindocr.data.base_dataset
mindocr.data.base_dataset.BaseDataset

Bases: object

Base dataset to parse dataset files.

PARAMETER DESCRIPTION
-

TYPE: data_dir

-

TYPE: label_file

-

names of elements in the output tuple of getitem

TYPE: output_columns (List(str

ATTRIBUTE DESCRIPTION
data_list

source data items (e.g., containing image path and raw annotation)

TYPE: List(Tuple

Source code in mindocr\data\base_dataset.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class BaseDataset(object):
    """
    Base dataset to parse dataset files.

    Args:
        - data_dir:
        - label_file:
        - output_columns (List(str)): names of elements in the output tuple of __getitem__
    Attributes:
        data_list (List(Tuple)): source data items (e.g., containing image path and raw annotation)
    """

    def __init__(
        self,
        data_dir: Union[str, List[str]],
        label_file: Union[str, List[str]] = None,
        output_columns: List[str] = None,
        **kwargs,
    ):
        self._index = 0
        self.data_list = []

        # check files
        if isinstance(data_dir, str):
            data_dir = [data_dir]
        for f in data_dir:
            if not os.path.exists(f):
                raise ValueError(f"data_dir '{f}' does not existed. Please check the yaml file for both train and eval")
        self.data_dir = data_dir

        if label_file is not None:
            if isinstance(label_file, str):
                label_file = [label_file]
            for f in label_file:
                if not os.path.exists(f):
                    raise ValueError(
                        f"label_file '{f}' does not existed. Please check the yaml file for both train and eval"
                    )
        else:
            label_file = []
        self.label_file = label_file

        # must specify output column names
        self.output_columns = output_columns

    def __getitem__(self, index):
        # return self.data_list[index]
        raise NotImplementedError

    def set_output_columns(self, column_names: List[str]):
        self.output_columns = column_names

    def get_output_columns(self) -> List[str]:
        """
        get the column names for the output tuple of __getitem__, required for data mapping in the next step
        """
        # raise NotImplementedError
        return self.output_columns

    def __next__(self):
        if self._index >= len(self.data_list):
            raise StopIteration
        else:
            item = self.__getitem__(self._index)
            self._index += 1
            return item

    def __len__(self):
        return len(self.data_list)

    def _load_image_bytes(self, img_path):
        """load image bytes (prepared for decoding)"""
        with open(img_path, "rb") as f:
            image_bytes = f.read()
        return image_bytes
mindocr.data.base_dataset.BaseDataset.get_output_columns()

get the column names for the output tuple of getitem, required for data mapping in the next step

Source code in mindocr\data\base_dataset.py
59
60
61
62
63
64
def get_output_columns(self) -> List[str]:
    """
    get the column names for the output tuple of __getitem__, required for data mapping in the next step
    """
    # raise NotImplementedError
    return self.output_columns
mindocr.data.builder
mindocr.data.builder.build_dataset(dataset_config, loader_config, num_shards=None, shard_id=None, is_train=True, **kwargs)

Build dataset for training and evaluation.

PARAMETER DESCRIPTION
dataset_config

dataset parsing and processing configuartion containing the following keys - type (str): dataset class name, please choose from supported_dataset_types. - dataset_root (str): the root directory to store the (multiple) dataset(s) - data_dir (Union[str, List[str]]): directory to the data, which is a subfolder path related to dataset_root. For multiple datasets, it is a list of subfolder paths. - label_file (Union[str, List[str]], optional): file path to the annotation related to the dataset_root. For multiple datasets, it is a list of relative file paths. Not required if using LMDBDataset. - sample_ratio (float): the sampling ratio of dataset. - shuffle (boolean): whether to shuffle the order of data samples. - transform_pipeline (list[dict]): each element corresponds to a transform operation on image and/or label - output_columns (list[str]): list of output features for each sample. - net_input_column_index (list[int]): input indices for network forward func in output_columns

TYPE: dict

loader_config

dataloader configuration containing keys: - batch_size (int): batch size for data loader - drop_remainder (boolean): whether to drop the data in the last batch when the total of data can not be divided by the batch_size - num_workers (int): number of subprocesses used to fetch the dataset in parallel.

TYPE: dict

num_shards

num of devices for distributed mode

TYPE: int, *optional* DEFAULT: None

shard_id

device id

TYPE: int, *optional* DEFAULT: None

is_train

whether it is in training stage

TYPE: boolean DEFAULT: True

**kwargs

optional args for extension. If refine_batch_size=True is given in kwargs, the batch size will be refined to be divisable to avoid droping remainding data samples in graph model, typically used for precise evaluation.

DEFAULT: {}

Return

data_loader (Dataset): dataloader to generate data batch

Notes
  • The main data process pipeline in MindSpore contains 3 parts: 1) load data files and generate source dataset, 2) perform per-data-row mapping such as image augmentation, 3) generate batch and apply batch mapping.
  • Each of the three steps supports multiprocess. Detailed mechanism can be seen in https://www.mindspore.cn/docs/zh-CN/r2.0.0-alpha/api_python/mindspore.dataset.html
  • A data row is a data tuple item containing multiple elements such as (image_i, mask_i, label_i). A data column corresponds to an element in the tuple like 'image', 'label'.
  • The total number of num_workers used for data loading and processing should not be larger than the maximum threads of the CPU. Otherwise, it will lead to resource competing overhead. Especially for distributed training, num_parallel_workers should not be too large to avoid thread competition.
Example
Load a DetDataset/RecDataset

from mindocr.data import build_dataset data_config = { "type": "DetDataset", "dataset_root": "path/to/datasets/", "data_dir": "ic15/det/train/ch4_test_images", "label_file": "ic15/det/train/det_gt.txt", "sample_ratio": 1.0, "shuffle": False, "transform_pipeline": [ { "DecodeImage": { "img_mode": "RGB", "to_float32": False } }, { "DetLabelEncode": {}, }, ], "output_columns": ['image', 'polys', 'ignore_tags'], "net_input_column_index`": [0] "label_column_index": [1, 2] } loader_config = dict(shuffle=True, batch_size=16, drop_remainder=False, num_workers=1) data_loader = build_dataset(data_config, loader_config, num_shards=1, shard_id=0, is_train=True)

Source code in mindocr\data\builder.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def build_dataset(
    dataset_config: dict,
    loader_config: dict,
    num_shards=None,
    shard_id=None,
    is_train=True,
    **kwargs,
):
    """
    Build dataset for training and evaluation.

    Args:
        dataset_config (dict): dataset parsing and processing configuartion containing the following keys
            - type (str): dataset class name, please choose from `supported_dataset_types`.
            - dataset_root (str): the root directory to store the (multiple) dataset(s)
            - data_dir (Union[str, List[str]]): directory to the data, which is a subfolder path related to
              `dataset_root`. For multiple datasets, it is a list of subfolder paths.
            - label_file (Union[str, List[str]], *optional*): file path to the annotation related to the `dataset_root`.
              For multiple datasets, it is a list of relative file paths. Not required if using LMDBDataset.
            - sample_ratio (float): the sampling ratio of dataset.
            - shuffle (boolean): whether to shuffle the order of data samples.
            - transform_pipeline (list[dict]): each element corresponds to a transform operation on image and/or label
            - output_columns (list[str]): list of output features for each sample.
            - net_input_column_index (list[int]): input indices for network forward func in output_columns
        loader_config (dict): dataloader configuration containing keys:
            - batch_size (int): batch size for data loader
            - drop_remainder (boolean): whether to drop the data in the last batch when the total of data can not be
              divided by the batch_size
            - num_workers (int): number of subprocesses used to fetch the dataset in parallel.
        num_shards (int, *optional*): num of devices for distributed mode
        shard_id (int, *optional*): device id
        is_train (boolean): whether it is in training stage
        **kwargs: optional args for extension. If `refine_batch_size=True` is given in kwargs, the batch size will be
            refined to be divisable to avoid
            droping remainding data samples in graph model, typically used for precise evaluation.

    Return:
        data_loader (Dataset): dataloader to generate data batch

    Notes:
        - The main data process pipeline in MindSpore contains 3 parts: 1) load data files and generate source dataset,
            2) perform per-data-row mapping such as image augmentation, 3) generate batch and apply batch mapping.
        - Each of the three steps supports multiprocess. Detailed mechanism can be seen in
            https://www.mindspore.cn/docs/zh-CN/r2.0.0-alpha/api_python/mindspore.dataset.html
        - A data row is a data tuple item containing multiple elements such as (image_i, mask_i, label_i).
            A data column corresponds to an element in the tuple like 'image', 'label'.
        - The total number of `num_workers` used for data loading and processing should not be larger than the maximum
            threads of the CPU. Otherwise, it will lead to resource competing overhead. Especially for distributed
            training, `num_parallel_workers` should not be too large to avoid thread competition.

    Example:
        >>> # Load a DetDataset/RecDataset
        >>> from mindocr.data import build_dataset
        >>> data_config = {
        >>>     "type": "DetDataset",
        >>>     "dataset_root": "path/to/datasets/",
        >>>     "data_dir": "ic15/det/train/ch4_test_images",
        >>>     "label_file": "ic15/det/train/det_gt.txt",
        >>>     "sample_ratio": 1.0,
        >>>     "shuffle": False,
        >>>     "transform_pipeline": [
        >>>         {
        >>>             "DecodeImage": {
        >>>                 "img_mode": "RGB",
        >>>                 "to_float32": False
        >>>                 }
        >>>         },
        >>>         {
        >>>             "DetLabelEncode": {},
        >>>         },
        >>>     ],
        >>>     "output_columns": ['image', 'polys', 'ignore_tags'],
        >>>     "net_input_column_index`": [0]
        >>>     "label_column_index": [1, 2]
        >>> }
        >>> loader_config = dict(shuffle=True, batch_size=16, drop_remainder=False, num_workers=1)
        >>> data_loader = build_dataset(data_config, loader_config, num_shards=1, shard_id=0, is_train=True)
    """
    # Check dataset paths (dataset_root, data_dir, and label_file) and update to absolute format
    dataset_config = _check_dataset_paths(dataset_config)

    # Set default multiprocessing params for data pipeline
    # num_parallel_workers: Number of subprocesses used to fetch the dataset, transform data, or load batch in parallel
    num_devices = 1 if num_shards is None else num_shards
    cores = multiprocessing.cpu_count()
    NUM_WORKERS_BATCH = 2
    NUM_WORKERS_MAP = int(
        cores / num_devices - NUM_WORKERS_BATCH
    )  # optimal num workers assuming all cpu cores are used in this job
    num_workers = loader_config.get("num_workers", NUM_WORKERS_MAP)
    if num_workers > int(cores / num_devices):
        print(
            f"WARNING: `num_workers` is adjusted to {int(cores / num_devices)} since {num_workers}x{num_devices} "
            f"exceeds the number of CPU cores {cores}"
        )
        num_workers = int(cores / num_devices)
    # prefetch_size: the length of the cache queue in the data pipeline for each worker, used to reduce waiting time.
    # Larger value leads to more memory consumption. Default: 16
    prefetch_size = loader_config.get("prefetch_size", 16)  #
    ms.dataset.config.set_prefetch_size(prefetch_size)
    # max_rowsize: MB of shared memory between processes to copy data. Only used when python_multiprocessing is True.
    max_rowsize = loader_config.get("max_rowsize", 64)
    # auto tune num_workers, prefetch. (This conflicts the profiler)
    # ms.dataset.config.set_autotune_interval(5)
    # ms.dataset.config.set_enable_autotune(True, "./dataproc_autotune_out")

    # 1. create source dataset (GeneratorDataset)
    # Invoke dataset class
    dataset_class_name = dataset_config.pop("type")
    assert dataset_class_name in supported_dataset_types, "Invalid dataset name"
    dataset_class = eval(dataset_class_name)
    dataset_args = dict(is_train=is_train, **dataset_config)
    dataset = dataset_class(**dataset_args)

    dataset_column_names = dataset.get_output_columns()
    # print('=> Dataset output columns: \n\t', dataset_column_names)

    # Generate source dataset (source w.r.t. the dataset.map pipeline)
    # based on python callable numpy dataset in parallel
    ds = ms.dataset.GeneratorDataset(
        dataset,
        column_names=dataset_column_names,
        num_parallel_workers=num_workers,
        num_shards=num_shards,
        shard_id=shard_id,
        python_multiprocessing=True,  # keep True to improve performace for heavy computation.
        max_rowsize=max_rowsize,
        shuffle=loader_config["shuffle"],
    )

    # 2. data mapping using mindata C lib (optional)
    # ds = ds.map(operations=transform_list, input_columns=['image', 'label'], num_parallel_workers=8,
    # python_multiprocessing=True)

    # 3. create loader
    # get batch of dataset by collecting batch_size consecutive data rows and apply batch operations
    num_samples = ds.get_dataset_size()
    batch_size = loader_config["batch_size"]

    device_id = 0 if shard_id is None else shard_id
    is_main_device = device_id == 0
    print(
        f"INFO: Creating dataloader (training={is_train}) for device {device_id}. Number of data samples: {num_samples}"
    )

    if "refine_batch_size" in kwargs:
        batch_size = _check_batch_size(num_samples, batch_size, refine=kwargs["refine_batch_size"])

    drop_remainder = loader_config.get("drop_remainder", is_train)
    if is_train and drop_remainder is False and is_main_device:
        print(
            "WARNING: `drop_remainder` should be True for training, otherwise the last batch may lead to training fail "
            "in Graph mode"
        )

    if not is_train:
        if drop_remainder and is_main_device:
            print(
                "WARNING: `drop_remainder` is forced to be False for evaluation to include the last batch for "
                "accurate evaluation."
            )
            drop_remainder = False

    dataloader = ds.batch(
        batch_size,
        drop_remainder=drop_remainder,
        num_parallel_workers=min(
            num_workers, 2
        ),  # set small workers for lite computation. TODO: increase for batch-wise mapping
        # input_columns=input_columns,
        # output_columns=batch_column,
        # per_batch_map=per_batch_map, # uncommet to use inner-batch transformation
    )

    return dataloader
mindocr.data.constants

Constant data enhancement parameters of Imagenet dataset

mindocr.data.det_dataset
mindocr.data.det_dataset.DetDataset

Bases: BaseDataset

General dataset for text detection The annotation format should follow:

.. code-block: none

# image file name       annotation info containing text and polygon points encoded by json.dumps
img_61.jpg      [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
PARAMETER DESCRIPTION
is_train

whether it is in training stage

TYPE: bool DEFAULT: True

data_dir

directory to the image data

TYPE: str DEFAULT: None

label_file

(list of) path to the label file(s), where each line in the label fle contains the image file name and its ocr annotation.

TYPE: Union[str, List[str]] DEFAULT: None

sample_ratio

sample ratios for the data items in label files

TYPE: Union[float, List[float]] DEFAULT: 1.0

shuffle(bool)

Optional, if not given, shuffle = is_train

transform_pipeline

list of dict, key - transform class name, value - a dict of param config. e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] if None, default transform pipeline for text detection will be taken.

TYPE: List[dict] DEFAULT: None

output_columns

required, indicates the keys in data dict that are expected to output for dataloader. if None, all data keys will be used for return.

TYPE: list DEFAULT: None

global_config

additional info, used in data transformation, possible keys: - character_dict_path

RETURNS DESCRIPTION
data

Depending on the transform pipeline, get_item returns a tuple for the specified data item.

TYPE: tuple

You can specify the output_columns arg to order the output data for dataloader.

Notes
  1. The data file structure should be like ├── data_dir │ ├── 000001.jpg │ ├── 000002.jpg │ ├── {image_file_name} ├── label_file.txt
Source code in mindocr\data\det_dataset.py
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
class DetDataset(BaseDataset):
    """
    General dataset for text detection
    The annotation format should follow:

    .. code-block: none

        # image file name\tannotation info containing text and polygon points encoded by json.dumps
        img_61.jpg\t[{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]

    Args:
        is_train (bool): whether it is in training stage
        data_dir (str):  directory to the image data
        label_file (Union[str, List[str]]): (list of) path to the label file(s),
            where each line in the label fle contains the image file name and its ocr annotation.
        sample_ratio (Union[float, List[float]]): sample ratios for the data items in label files
        shuffle(bool): Optional, if not given, shuffle = is_train
        transform_pipeline: list of dict, key - transform class name, value - a dict of param config.
                    e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}]
                    if None, default transform pipeline for text detection will be taken.
        output_columns (list): required, indicates the keys in data dict that are expected to output for dataloader.
                            if None, all data keys will be used for return.
        global_config: additional info, used in data transformation, possible keys:
            - character_dict_path

    Returns:
        data (tuple): Depending on the transform pipeline, __get_item__ returns a tuple for the specified data item.
        You can specify the `output_columns` arg to order the output data for dataloader.

    Notes:
        1. The data file structure should be like
            ├── data_dir
            │     ├── 000001.jpg
            │     ├── 000002.jpg
            │     ├── {image_file_name}
            ├── label_file.txt
    """

    def __init__(
        self,
        is_train: bool = True,
        data_dir: Union[str, List[str]] = None,
        label_file: Union[List, str] = None,
        sample_ratio: Union[List, float] = 1.0,
        shuffle: bool = None,
        transform_pipeline: List[dict] = None,
        output_columns: List[str] = None,
        **kwargs,
    ):
        super().__init__(data_dir=data_dir, label_file=label_file, output_columns=output_columns)

        # check args
        if isinstance(sample_ratio, float):
            sample_ratio = [sample_ratio] * len(self.label_file)

        shuffle = shuffle if shuffle is not None else is_train

        # load date file list
        self.data_list = self.load_data_list(self.label_file, sample_ratio, shuffle)

        # create transform
        if transform_pipeline is not None:
            global_config = dict(is_train=is_train)
            self.transforms = create_transforms(transform_pipeline, global_config)
        else:
            raise ValueError("No transform pipeline is specified!")

        # prefetch the data keys, to fit GeneratorDataset
        _data = self.data_list[0].copy()  # WARNING: shallow copy. Do deep copy if necessary.
        _data = run_transforms(_data, transforms=self.transforms)
        _available_keys = list(_data.keys())

        if output_columns is None:
            self.output_columns = _available_keys
        else:
            self.output_columns = []
            for k in output_columns:
                if k in _data:
                    self.output_columns.append(k)
                else:
                    raise ValueError(
                        f"Key '{k}' does not exist in data (available keys: {_data.keys()}). "
                        "Please check the name or the completeness transformation pipeline."
                    )

    def __getitem__(self, index):
        data = self.data_list[index].copy()  # WARNING: shallow copy. Do deep copy if necessary.

        # perform transformation on data
        try:
            data = run_transforms(data, transforms=self.transforms)
            output_tuple = tuple(data[k] for k in self.output_columns)
        except Exception as e:
            print(f"Error occurred while processing the image: {self.data_list[index]['img_path']}\n", e, flush=True)
            return self[random.randrange(len(self.data_list))]  # return another random sample instead

        return output_tuple

    def load_data_list(
        self, label_file: List[str], sample_ratio: List[float], shuffle: bool = False, **kwargs
    ) -> List[dict]:
        """Load data list from label_file which contains infomation of image paths and annotations
        Args:
            label_file: annotation file path(s)
            sample_ratio sample ratio for data items in each annotation file
            shuffle: shuffle the data list
        Returns:
            data (List[dict]): A list of annotation dict, which contains keys: img_path, annot...
        """

        # parse image file path and annotation and load
        data_list = []
        for idx, label_fp in enumerate(label_file):
            img_dir = self.data_dir[idx]
            with open(label_fp, "r", encoding="utf-8") as f:
                lines = f.readlines()
                if shuffle:
                    lines = random.sample(lines, round(len(lines) * sample_ratio[idx]))
                else:
                    lines = lines[: round(len(lines) * sample_ratio[idx])]

                for line in lines:
                    img_name, annot_str = self._parse_annotation(line)
                    if annot_str == "[]":
                        continue
                    img_path = os.path.join(img_dir, img_name)
                    assert os.path.exists(img_path), "{} does not exist!".format(img_path)

                    data = {"img_path": img_path, "label": annot_str}
                    data_list.append(data)

        return data_list

    def _parse_annotation(self, data_line: str):
        data_line_tmp = data_line.strip()
        if "\t" in data_line_tmp:
            img_name, annot_str = data_line.strip().split("\t")
        elif " " in data_line_tmp:
            img_name, annot_str = data_line.strip().split(" ")
        else:
            raise ValueError(
                "Incorrect label file format: the file name and the label should be separated by " "a space or tab"
            )

        return img_name, annot_str
mindocr.data.det_dataset.DetDataset.load_data_list(label_file, sample_ratio, shuffle=False, **kwargs)

Load data list from label_file which contains infomation of image paths and annotations

PARAMETER DESCRIPTION
label_file

annotation file path(s)

TYPE: List[str]

shuffle

shuffle the data list

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
data

A list of annotation dict, which contains keys: img_path, annot...

TYPE: List[dict]

Source code in mindocr\data\det_dataset.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def load_data_list(
    self, label_file: List[str], sample_ratio: List[float], shuffle: bool = False, **kwargs
) -> List[dict]:
    """Load data list from label_file which contains infomation of image paths and annotations
    Args:
        label_file: annotation file path(s)
        sample_ratio sample ratio for data items in each annotation file
        shuffle: shuffle the data list
    Returns:
        data (List[dict]): A list of annotation dict, which contains keys: img_path, annot...
    """

    # parse image file path and annotation and load
    data_list = []
    for idx, label_fp in enumerate(label_file):
        img_dir = self.data_dir[idx]
        with open(label_fp, "r", encoding="utf-8") as f:
            lines = f.readlines()
            if shuffle:
                lines = random.sample(lines, round(len(lines) * sample_ratio[idx]))
            else:
                lines = lines[: round(len(lines) * sample_ratio[idx])]

            for line in lines:
                img_name, annot_str = self._parse_annotation(line)
                if annot_str == "[]":
                    continue
                img_path = os.path.join(img_dir, img_name)
                assert os.path.exists(img_path), "{} does not exist!".format(img_path)

                data = {"img_path": img_path, "label": annot_str}
                data_list.append(data)

    return data_list
mindocr.data.predict_dataset

Inference dataset class

mindocr.data.predict_dataset.PredictDataset

Bases: BaseDataset

  1. The data file structure should be like ├── img_dir │ ├── 000001.jpg │ ├── 000002.jpg │ ├── {image_file_name}
Source code in mindocr\data\predict_dataset.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class PredictDataset(BaseDataset):
    """
    Notes:
    1. The data file structure should be like
        ├── img_dir
        │     ├── 000001.jpg
        │     ├── 000002.jpg
        │     ├── {image_file_name}
    """

    def __init__(
        self,
        # is_train: bool = False,
        dataset_root: str = "",
        data_dir: str = "",
        sample_ratio: Union[List, float] = 1.0,
        shuffle: bool = None,
        transform_pipeline: List[dict] = None,
        output_columns: List[str] = None,
        **kwargs,
    ):
        img_dir = os.path.join(dataset_root, data_dir)
        super().__init__(data_dir=img_dir, label_file=None, output_columns=output_columns)
        self.data_list = self.load_data_list(img_dir, sample_ratio, shuffle)

        # create transform
        if transform_pipeline is not None:
            self.transforms = create_transforms(transform_pipeline)  # , global_config=global_config)
        else:
            raise ValueError("No transform pipeline is specified!")

        # prefetch the data keys, to fit GeneratorDataset
        _data = self.data_list[0]
        _data = run_transforms(_data, transforms=self.transforms)
        _available_keys = list(_data.keys())
        if output_columns is None:
            self.output_columns = _available_keys
        else:
            self.output_columns = []
            for k in output_columns:
                if k in _data:
                    self.output_columns.append(k)
                else:
                    raise ValueError(
                        f"Key '{k}' does not exist in data (available keys: {_data.keys()}). "
                        "Please check the name or the completeness transformation pipeline."
                    )

    def __getitem__(self, index):
        data = self.data_list[index]

        # perform transformation on data
        data = run_transforms(data, transforms=self.transforms)
        output_tuple = tuple(data[k] for k in self.output_columns)

        return output_tuple

    def load_data_list(self, img_dir: str, sample_ratio: List[float], shuffle: bool = False, **kwargs) -> List[dict]:
        # read image file name
        img_filenames = os.listdir(img_dir)
        if shuffle:
            img_filenames = random.sample(img_filenames, round(len(img_filenames) * sample_ratio))
        else:
            img_filenames = img_filenames[: round(len(img_filenames) * sample_ratio)]

        img_paths = [{"img_path": os.path.join(img_dir, filename)} for filename in img_filenames]

        return img_paths
mindocr.data.rec_dataset
mindocr.data.rec_dataset.RecDataset

Bases: DetDataset

General dataset for text recognition The annotation format should follow:

.. code-block: none

# image file name       ground truth text
word_18.png     STAGE
word_19.png     HarbourFront
PARAMETER DESCRIPTION
is_train

whether it is in training stage

TYPE: bool

data_dir

directory to the image data

TYPE: str

label_file

(list of) path to the label file(s), where each line in the label fle contains the image file name and its ocr annotation.

TYPE: Union[str, List[str]]

sample_ratio

sample ratios for the data items in label files

TYPE: Union[float, List[float]]

shuffle(bool)

Optional, if not given, shuffle = is_train

transform_pipeline

list of dict, key - transform class name, value - a dict of param config. e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] if None, default transform pipeline for text detection will be taken.

output_columns

required, indicates the keys in data dict that are expected to output for dataloader. if None, all data keys will be used for return.

TYPE: list

global_config

additional info, used in data transformation, possible keys: - character_dict_path

RETURNS DESCRIPTION
data

Depending on the transform pipeline, get_item returns a tuple for the specified data item.

TYPE: tuple

You can specify the output_columns arg to order the output data for dataloader.

Notes
  1. The data file structure should be like ├── data_dir │ ├── 000001.jpg │ ├── 000002.jpg │ ├── {image_file_name} ├── label_file.txt
Source code in mindocr\data\rec_dataset.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class RecDataset(DetDataset):
    """
    General dataset for text recognition
    The annotation format should follow:

    .. code-block: none

        # image file name\tground truth text
        word_18.png\tSTAGE
        word_19.png\tHarbourFront

    Args:
        is_train (bool): whether it is in training stage
        data_dir (str):  directory to the image data
        label_file (Union[str, List[str]]): (list of) path to the label file(s),
            where each line in the label fle contains the image file name and its ocr annotation.
        sample_ratio (Union[float, List[float]]): sample ratios for the data items in label files
        shuffle(bool): Optional, if not given, shuffle = is_train
        transform_pipeline: list of dict, key - transform class name, value - a dict of param config.
                    e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}]
                    if None, default transform pipeline for text detection will be taken.
        output_columns (list): required, indicates the keys in data dict that are expected to output for dataloader.
                            if None, all data keys will be used for return.
        global_config: additional info, used in data transformation, possible keys:
            - character_dict_path

    Returns:
        data (tuple): Depending on the transform pipeline, __get_item__ returns a tuple for the specified data item.
        You can specify the `output_columns` arg to order the output data for dataloader.

    Notes:
        1. The data file structure should be like
            ├── data_dir
            │     ├── 000001.jpg
            │     ├── 000002.jpg
            │     ├── {image_file_name}
            ├── label_file.txt
    """
mindocr.data.rec_lmdb_dataset
mindocr.data.rec_lmdb_dataset.LMDBDataset

Bases: BaseDataset

Data iterator for ocr datasets including ICDAR15 dataset. The annotaiton format is required to aligned to paddle, which can be done using the converter.py script.

PARAMETER DESCRIPTION
is_train

whether the dataset is for training

TYPE: bool DEFAULT: True

data_dir

data root directory for lmdb dataset(s)

TYPE: str DEFAULT: ''

shuffle

Optional, if not given, shuffle = is_train

TYPE: Optional[bool] DEFAULT: None

transform_pipeline

list of dict, key - transform class name, value - a dict of param config. e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] - if None, default transform pipeline for text detection will be taken.

TYPE: Optional[List[dict]] DEFAULT: None

output_columns

optional, indicates the keys in data dict that are expected to output for dataloader. if None, all data keys will be used for return.

TYPE: list DEFAULT: None

filter_max_len

Filter the records where the label is longer than the max_text_len.

TYPE: bool DEFAULT: False

max_text_len

The maximum text length the dataloader expected.

TYPE: int DEFAULT: None

RETURNS DESCRIPTION
data

Depending on the transform pipeline, get_item returns a tuple for the specified data item.

TYPE: tuple

You can specify the output_columns arg to order the output data for dataloader.

Notes
  1. Dataset file structure should follow: data_dir ├── dataset01 ├── data.mdb ├── lock.mdb ├── dataset02 ├── data.mdb ├── lock.mdb ├── ...
Source code in mindocr\data\rec_lmdb_dataset.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
class LMDBDataset(BaseDataset):
    """Data iterator for ocr datasets including ICDAR15 dataset.
    The annotaiton format is required to aligned to paddle, which can be done using the `converter.py` script.

    Args:
        is_train: whether the dataset is for training
        data_dir: data root directory for lmdb dataset(s)
        shuffle: Optional, if not given, shuffle = is_train
        transform_pipeline: list of dict, key - transform class name, value - a dict of param config.
                    e.g., [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}]
            -       if None, default transform pipeline for text detection will be taken.
        output_columns (list): optional, indicates the keys in data dict that are expected to output for dataloader.
            if None, all data keys will be used for return.
        filter_max_len (bool): Filter the records where the label is longer than the `max_text_len`.
        max_text_len (int): The maximum text length the dataloader expected.

    Returns:
        data (tuple): Depending on the transform pipeline, __get_item__ returns a tuple for the specified data item.
        You can specify the `output_columns` arg to order the output data for dataloader.

    Notes:
        1. Dataset file structure should follow:
            data_dir
            ├── dataset01
                ├── data.mdb
                ├── lock.mdb
            ├── dataset02
                ├── data.mdb
                ├── lock.mdb
            ├── ...
    """

    def __init__(
        self,
        is_train: bool = True,
        data_dir: str = "",
        sample_ratio: float = 1.0,
        shuffle: Optional[bool] = None,
        transform_pipeline: Optional[List[dict]] = None,
        output_columns: Optional[List[str]] = None,
        filter_max_len: bool = False,
        max_text_len: Optional[int] = None,
        **kwargs: Any,
    ):
        self.data_dir = data_dir
        self.filter_max_len = filter_max_len
        self.max_text_len = max_text_len

        shuffle = shuffle if shuffle is not None else is_train

        self.lmdb_sets = self.load_list_of_hierarchical_lmdb_dataset(data_dir)
        if len(self.lmdb_sets) == 0:
            raise ValueError(f"Cannot find any lmdb dataset under `{data_dir}`. Please check the data path is correct.")
        self.data_idx_order_list = self.get_dataset_idx_orders(sample_ratio, shuffle)

        # filter the max length
        if filter_max_len:
            if max_text_len is None:
                raise ValueError("`max_text_len` must be provided when `filter_max_len` is True.")
            self.data_idx_order_list = self.filter_idx_list(self.data_idx_order_list)

        # create transform
        if transform_pipeline is not None:
            self.transforms = create_transforms(transform_pipeline)
        else:
            raise ValueError("No transform pipeline is specified!")

        self.prefetch(output_columns)

    def prefetch(self, output_columns):
        # prefetch the data keys, to fit GeneratorDataset
        _data = self.data_idx_order_list[0]
        lmdb_idx, file_idx = self.data_idx_order_list[0]
        lmdb_idx = int(lmdb_idx)
        file_idx = int(file_idx)
        sample_info = self.get_lmdb_sample_info(self.lmdb_sets[lmdb_idx]["txn"], file_idx)
        _data = {"img_lmdb": sample_info[0], "label": sample_info[1]}
        _data = run_transforms(_data, transforms=self.transforms)
        _available_keys = list(_data.keys())

        if output_columns is None:
            self.output_columns = _available_keys
        else:
            self.output_columns = []
            for k in output_columns:
                if k in _data:
                    self.output_columns.append(k)
                else:
                    raise ValueError(
                        f"Key {k} does not exist in data (available keys: {_data.keys()}). "
                        "Please check the name or the completeness transformation pipeline."
                    )

    def filter_idx_list(self, idx_list: np.ndarray) -> np.ndarray:
        print("Start filtering the idx list...")
        new_idx_list = list()
        for lmdb_idx, file_idx in idx_list:
            label = self.get_lmdb_sample_info(self.lmdb_sets[int(lmdb_idx)]["txn"], int(file_idx), label_only=True)
            if len(label) > self.max_text_len:
                print(
                    f"WARNING: skip the label with length ({len(label)}), "
                    f"which is longer than than max length ({self.max_text_len})."
                )
                continue
            new_idx_list.append((lmdb_idx, file_idx))
        new_idx_list = np.array(new_idx_list)
        return new_idx_list

    def load_list_of_hierarchical_lmdb_dataset(self, data_dir):
        if isinstance(data_dir, str):
            results = self.load_hierarchical_lmdb_dataset(data_dir)
        elif isinstance(data_dir, list):
            results = {}
            for sub_data_dir in data_dir:
                start_idx = len(results)
                lmdb_sets = self.load_hierarchical_lmdb_dataset(sub_data_dir, start_idx)
                results.update(lmdb_sets)
        else:
            results = {}

        return results

    def load_hierarchical_lmdb_dataset(self, data_dir, start_idx=0):
        lmdb_sets = {}
        dataset_idx = start_idx
        for rootdir, dirs, _ in os.walk(data_dir + "/"):
            if not dirs:
                env = lmdb.open(rootdir, max_readers=32, readonly=True, lock=False, readahead=False, meminit=False)
                txn = env.begin(write=False)
                data_size = int(txn.get("num-samples".encode()))
                lmdb_sets[dataset_idx] = {"rootdir": rootdir, "env": env, "txn": txn, "data_size": data_size}
                dataset_idx += 1
        return lmdb_sets

    def get_dataset_idx_orders(self, sample_ratio, shuffle):
        n_lmdbs = len(self.lmdb_sets)
        total_sample_num = 0
        for idx in range(n_lmdbs):
            total_sample_num += self.lmdb_sets[idx]["data_size"]
        data_idx_order_list = np.zeros((total_sample_num, 2))
        beg_idx = 0
        for idx in range(n_lmdbs):
            tmp_sample_num = self.lmdb_sets[idx]["data_size"]
            end_idx = beg_idx + tmp_sample_num
            data_idx_order_list[beg_idx:end_idx, 0] = idx
            data_idx_order_list[beg_idx:end_idx, 1] = list(range(tmp_sample_num))
            data_idx_order_list[beg_idx:end_idx, 1] += 1
            beg_idx = beg_idx + tmp_sample_num

        if shuffle:
            np.random.shuffle(data_idx_order_list)

        data_idx_order_list = data_idx_order_list[: round(len(data_idx_order_list) * sample_ratio)]

        return data_idx_order_list

    def get_lmdb_sample_info(self, txn, idx, label_only=False):
        label_key = "label-%09d".encode() % idx
        label = txn.get(label_key)
        if label is None:
            raise ValueError(f"Cannot find key {label_key}")
        label = label.decode("utf-8")

        if label_only:
            return label

        img_key = "image-%09d".encode() % idx
        imgbuf = txn.get(img_key)
        return imgbuf, label

    def __getitem__(self, idx):
        lmdb_idx, file_idx = self.data_idx_order_list[idx]
        sample_info = self.get_lmdb_sample_info(self.lmdb_sets[int(lmdb_idx)]["txn"], int(file_idx))

        data = {"img_lmdb": sample_info[0], "label": sample_info[1]}

        # perform transformation on data
        data = run_transforms(data, transforms=self.transforms)
        output_tuple = tuple(data[k] for k in self.output_columns)

        return output_tuple

    def __len__(self):
        return self.data_idx_order_list.shape[0]
mindocr.data.transforms

transforms init

mindocr.data.transforms.det_east_transforms
mindocr.data.transforms.det_east_transforms.EASTProcessTrain
Source code in mindocr\data\transforms\det_east_transforms.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
class EASTProcessTrain:
    def __init__(self, scale=0.25, length=512, **kwargs):
        super(EASTProcessTrain, self).__init__()
        self.scale = scale
        self.length = length

    def __call__(self, data):
        vertices, labels = self._extract_vertices(data["label"])
        img = Image.fromarray(data["image"])
        img, vertices = self._adjust_height(img, vertices)
        img, vertices = self._adjust_width(img, vertices)
        if np.random.rand() < 0.5:
            img, vertices = self._rotate_img(img, vertices)
        img, vertices = self._crop_img(img, vertices, labels, self.length)
        score_map, geo_map, ignored_map = self._get_score_geo(img, vertices, labels, self.scale, self.length)
        score_map = score_map.transpose(2, 0, 1)
        ignored_map = ignored_map.transpose(2, 0, 1)
        geo_map = geo_map.transpose(2, 0, 1)
        if np.sum(score_map) < 1:
            score_map[0, 0, 0] = 1
        image = np.asarray(img)
        data["image"] = image
        data["score_map"] = score_map
        data["geo_map"] = geo_map
        data["training_mask"] = ignored_map
        return data

    def _cal_distance(self, x1, y1, x2, y2):
        """calculate the Euclidean distance"""
        return math.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)

    def _move_points(self, vertices, index1, index2, r, coef):
        """
        move the two points to shrink edge
        Input:
          vertices: vertices of text region <numpy.ndarray, (8,)>
          index1  : offset of point1
          index2  : offset of point2
          r       : [r1, r2, r3, r4] in paper
          coef    : shrink ratio in paper
        Output:
          vertices: vertices where one edge has been shinked
        """
        index1 = index1 % 4
        index2 = index2 % 4
        x1_index = index1 * 2 + 0
        y1_index = index1 * 2 + 1
        x2_index = index2 * 2 + 0
        y2_index = index2 * 2 + 1

        r1 = r[index1]
        r2 = r[index2]
        length_x = vertices[x1_index] - vertices[x2_index]
        length_y = vertices[y1_index] - vertices[y2_index]
        length = self._cal_distance(vertices[x1_index], vertices[y1_index], vertices[x2_index], vertices[y2_index])
        if length > 1:
            ratio = (r1 * coef) / length
            vertices[x1_index] += ratio * (-length_x)
            vertices[y1_index] += ratio * (-length_y)
            ratio = (r2 * coef) / length
            vertices[x2_index] += ratio * length_x
            vertices[y2_index] += ratio * length_y
        return vertices

    def _shrink_poly(self, vertices, coef=0.3):
        """
        shrink the text region
        Input:
          vertices: vertices of text region <numpy.ndarray, (8,)>
          coef    : shrink ratio in paper
        Output:
          v       : vertices of shrunk text region <numpy.ndarray, (8,)>
        """
        x1, y1, x2, y2, x3, y3, x4, y4 = vertices
        r1 = min(self._cal_distance(x1, y1, x2, y2), self._cal_distance(x1, y1, x4, y4))
        r2 = min(self._cal_distance(x2, y2, x1, y1), self._cal_distance(x2, y2, x3, y3))
        r3 = min(self._cal_distance(x3, y3, x2, y2), self._cal_distance(x3, y3, x4, y4))
        r4 = min(self._cal_distance(x4, y4, x1, y1), self._cal_distance(x4, y4, x3, y3))
        r = [r1, r2, r3, r4]

        # obtain offset to perform move_points() automatically
        if self._cal_distance(x1, y1, x2, y2) + self._cal_distance(x3, y3, x4, y4) > self._cal_distance(
            x2, y2, x3, y3
        ) + self._cal_distance(x1, y1, x4, y4):
            offset = 0  # two longer edges are (x1y1-x2y2) & (x3y3-x4y4)
        else:
            offset = 1  # two longer edges are (x2y2-x3y3) & (x4y4-x1y1)

        v = vertices.copy()
        v = self._move_points(v, 0 + offset, 1 + offset, r, coef)
        v = self._move_points(v, 2 + offset, 3 + offset, r, coef)
        v = self._move_points(v, 1 + offset, 2 + offset, r, coef)
        v = self._move_points(v, 3 + offset, 4 + offset, r, coef)
        return v

    def _get_rotate_mat(self, theta):
        """positive theta value means rotate clockwise"""
        return np.array([[math.cos(theta), -math.sin(theta)], [math.sin(theta), math.cos(theta)]])

    def _rotate_vertices(self, vertices, theta, anchor=None):
        """
        rotate vertices around anchor
        Input:
          vertices: vertices of text region <numpy.ndarray, (8,)>
          theta   : angle in radian measure
          anchor  : fixed position during rotation
        Output:
          rotated vertices <numpy.ndarray, (8,)>
        """
        v = vertices.reshape((4, 2)).T
        if anchor is None:
            anchor = v[:, :1]
        rotate_mat = self._get_rotate_mat(theta)
        res = np.dot(rotate_mat, v - anchor)
        return (res + anchor).T.reshape(-1)

    def _get_boundary(self, vertices):
        """
        get the tight boundary around given vertices
        Input:
          vertices: vertices of text region <numpy.ndarray, (8,)>
        Output:
          the boundary
        """
        x1, y1, x2, y2, x3, y3, x4, y4 = vertices
        x_min = min(x1, x2, x3, x4)
        x_max = max(x1, x2, x3, x4)
        y_min = min(y1, y2, y3, y4)
        y_max = max(y1, y2, y3, y4)
        return x_min, x_max, y_min, y_max

    def _cal_error(self, vertices):
        """
        default orientation is x1y1 : left-top, x2y2 : right-top, x3y3 : right-bot, x4y4 : left-bot
        calculate the difference between the vertices orientation and default orientation
        Input:
          vertices: vertices of text region <numpy.ndarray, (8,)>
        Output:
          err     : difference measure
        """
        x_min, x_max, y_min, y_max = self._get_boundary(vertices)
        x1, y1, x2, y2, x3, y3, x4, y4 = vertices
        err = (
            self._cal_distance(x1, y1, x_min, y_min)
            + self._cal_distance(x2, y2, x_max, y_min)
            + self._cal_distance(x3, y3, x_max, y_max)
            + self._cal_distance(x4, y4, x_min, y_max)
        )
        return err

    def _find_min_rect_angle(self, vertices):
        """
        find the best angle to rotate poly and obtain min rectangle
        Input:
          vertices: vertices of text region <numpy.ndarray, (8,)>
        Output:
          the best angle <radian measure>
        """
        angle_interval = 1
        angle_list = list(range(-90, 90, angle_interval))
        area_list = []
        for theta in angle_list:
            rotated = self._rotate_vertices(vertices, theta / 180 * math.pi)
            x1, y1, x2, y2, x3, y3, x4, y4 = rotated
            temp_area = (max(x1, x2, x3, x4) - min(x1, x2, x3, x4)) * (max(y1, y2, y3, y4) - min(y1, y2, y3, y4))
            area_list.append(temp_area)

        sorted_area_index = sorted(list(range(len(area_list))), key=lambda k: area_list[k])
        min_error = float("inf")
        best_index = -1
        rank_num = 10
        # find the best angle with correct orientation
        for index in sorted_area_index[:rank_num]:
            rotated = self._rotate_vertices(vertices, angle_list[index] / 180 * math.pi)
            temp_error = self._cal_error(rotated)
            if temp_error < min_error:
                min_error = temp_error
                best_index = index
        return angle_list[best_index] / 180 * math.pi

    def _is_cross_text(self, start_loc, length, vertices):
        """
        check if the crop image crosses text regions
        Input:
          start_loc: left-top position
          length   : length of crop image
          vertices : vertices of text regions <numpy.ndarray, (n,8)>
        Output:
          True if crop image crosses text region
        """
        if vertices.size == 0:
            return False
        start_w, start_h = start_loc
        a = np.array(
            [start_w, start_h, start_w + length, start_h, start_w + length, start_h + length, start_w, start_h + length]
        ).reshape((4, 2))
        p1 = Polygon(a).convex_hull
        for vertice in vertices:
            p2 = Polygon(vertice.reshape((4, 2))).convex_hull
            inter = p1.intersection(p2).area
            if 0.01 <= inter / p2.area <= 0.99:
                return True
        return False

    def _crop_img(self, img, vertices, labels, length):
        """
        crop img patches to obtain batch and augment
        Input:
          img         : PIL Image
          vertices    : vertices of text regions <numpy.ndarray, (n,8)>
          labels      : 1->valid, 0->ignore, <numpy.ndarray, (n,)>
          length      : length of cropped image region
        Output:
          region      : cropped image region
          new_vertices: new vertices in cropped region
        """
        h, w = img.height, img.width
        # confirm the shortest side of image >= length
        if h >= w and w < length:
            img = img.resize((length, int(h * length / w)), Image.BILINEAR)
        elif h < w and h < length:
            img = img.resize((int(w * length / h), length), Image.BILINEAR)
        ratio_w = img.width / w
        ratio_h = img.height / h
        assert ratio_w >= 1 and ratio_h >= 1

        new_vertices = np.zeros(vertices.shape)
        if vertices.size > 0:
            new_vertices[:, [0, 2, 4, 6]] = vertices[:, [0, 2, 4, 6]] * ratio_w
            new_vertices[:, [1, 3, 5, 7]] = vertices[:, [1, 3, 5, 7]] * ratio_h

        # find random position
        remain_h = img.height - length
        remain_w = img.width - length
        flag = True
        cnt = 0
        while flag and cnt < 1000:
            cnt += 1
            start_w = int(np.random.rand() * remain_w)
            start_h = int(np.random.rand() * remain_h)
            flag = self._is_cross_text([start_w, start_h], length, new_vertices[labels == 1, :])
        box = (start_w, start_h, start_w + length, start_h + length)
        region = img.crop(box)
        if new_vertices.size == 0:
            return region, new_vertices

        new_vertices[:, [0, 2, 4, 6]] -= start_w
        new_vertices[:, [1, 3, 5, 7]] -= start_h
        return region, new_vertices

    def _rotate_all_pixels(self, rotate_mat, anchor_x, anchor_y, length):
        """
        get rotated locations of all pixels for next stages
        Input:
          rotate_mat: rotatation matrix
          anchor_x  : fixed x position
          anchor_y  : fixed y position
          length    : length of image
        Output:
          rotated_x : rotated x positions <numpy.ndarray, (length,length)>
          rotated_y : rotated y positions <numpy.ndarray, (length,length)>
        """
        x = np.arange(length)
        y = np.arange(length)
        x, y = np.meshgrid(x, y)
        x_lin = x.reshape((1, x.size))
        y_lin = y.reshape((1, x.size))
        coord_mat = np.concatenate((x_lin, y_lin), 0)
        rotated_coord = np.matmul(
            rotate_mat.astype(np.float16), (coord_mat - np.array([[anchor_x], [anchor_y]])).astype(np.float16)
        ) + np.array([[anchor_x], [anchor_y]])
        rotated_x = rotated_coord[0, :].reshape(x.shape)
        rotated_y = rotated_coord[1, :].reshape(y.shape)
        return rotated_x, rotated_y

    def _adjust_height(self, img, vertices, ratio=0.2):
        """
        adjust height of image to aug data
        Input:
          img         : PIL Image
          vertices    : vertices of text regions <numpy.ndarray, (n,8)>
          ratio       : height changes in [0.8, 1.2]
        Output:
          img         : adjusted PIL Image
          new_vertices: adjusted vertices
        """
        ratio_h = 1 + ratio * (np.random.rand() * 2 - 1)
        old_h = img.height
        new_h = int(np.around(old_h * ratio_h))
        img = img.resize((img.width, new_h), Image.BILINEAR)

        new_vertices = vertices.copy()
        if vertices.size > 0:
            new_vertices[:, [1, 3, 5, 7]] = vertices[:, [1, 3, 5, 7]] * (new_h / old_h)
        return img, new_vertices

    def _adjust_width(self, img, vertices, ratio=0.2):
        """
        adjust width of image to aug data
        Input:
          img         : PIL Image
          vertices    : vertices of text regions <numpy.ndarray, (n,8)>
          ratio       : height changes in [0.8, 1.2]
        Output:
          img         : adjusted PIL Image
          new_vertices: adjusted vertices
        """
        ratio_w = 1 + ratio * (np.random.rand() * 2 - 1)
        old_w = img.width
        new_w = int(np.around(old_w * ratio_w))
        img = img.resize((new_w, img.height), Image.BILINEAR)

        new_vertices = vertices.copy()
        if vertices.size > 0:
            new_vertices[:, [0, 2, 4, 6]] = vertices[:, [0, 2, 4, 6]] * (new_w / old_w)
        return img, new_vertices

    def _rotate_img(self, img, vertices, angle_range=10):
        """
        rotate image [-10, 10] degree to aug data
        Input:
          img         : PIL Image
          vertices    : vertices of text regions <numpy.ndarray, (n,8)>
          angle_range : rotate range
        Output:
          img         : rotated PIL Image
          new_vertices: rotated vertices
        """
        center_x = (img.width - 1) / 2
        center_y = (img.height - 1) / 2
        angle = angle_range * (np.random.rand() * 2 - 1)
        img = img.rotate(angle, Image.BILINEAR)
        new_vertices = np.zeros(vertices.shape)
        for i, vertice in enumerate(vertices):
            new_vertices[i, :] = self._rotate_vertices(
                vertice, -angle / 180 * math.pi, np.array([[center_x], [center_y]])
            )
        return img, new_vertices

    def _get_score_geo(self, img, vertices, labels, scale, length):
        """
        generate score gt and geometry gt
        Input:
          img     : PIL Image
          vertices: vertices of text regions <numpy.ndarray, (n,8)>
          labels  : 1->valid, 0->ignore, <numpy.ndarray, (n,)>
          scale   : feature map / image
          length  : image length
        Output:
          score gt, geo gt, ignored
        """
        score_map = np.zeros((int(img.height * scale), int(img.width * scale), 1), np.float32)
        geo_map = np.zeros((int(img.height * scale), int(img.width * scale), 5), np.float32)
        ignored_map = np.zeros((int(img.height * scale), int(img.width * scale), 1), np.float32)

        index = np.arange(0, length, int(1 / scale))
        index_x, index_y = np.meshgrid(index, index)
        ignored_polys = []
        polys = []

        for i, vertice in enumerate(vertices):
            if labels[i] == 0:
                ignored_polys.append(np.around(scale * vertice.reshape((4, 2))).astype(np.int32))
                continue

            poly = np.around(scale * self._shrink_poly(vertice).reshape((4, 2))).astype(np.int32)
            polys.append(poly)
            temp_mask = np.zeros(score_map.shape[:-1], np.float32)
            cv2.fillPoly(temp_mask, [poly], 1)

            theta = self._find_min_rect_angle(vertice)
            rotate_mat = self._get_rotate_mat(theta)

            rotated_vertices = self._rotate_vertices(vertice, theta)
            x_min, x_max, y_min, y_max = self._get_boundary(rotated_vertices)
            rotated_x, rotated_y = self._rotate_all_pixels(rotate_mat, vertice[0], vertice[1], length)

            d1 = rotated_y - y_min
            d1[d1 < 0] = 0
            d2 = y_max - rotated_y
            d2[d2 < 0] = 0
            d3 = rotated_x - x_min
            d3[d3 < 0] = 0
            d4 = x_max - rotated_x
            d4[d4 < 0] = 0
            geo_map[:, :, 0] += d1[index_y, index_x] * temp_mask
            geo_map[:, :, 1] += d2[index_y, index_x] * temp_mask
            geo_map[:, :, 2] += d3[index_y, index_x] * temp_mask
            geo_map[:, :, 3] += d4[index_y, index_x] * temp_mask
            geo_map[:, :, 4] += theta * temp_mask

        cv2.fillPoly(ignored_map, ignored_polys, 1)
        cv2.fillPoly(score_map, polys, 1)
        return score_map, geo_map, ignored_map

    def _extract_vertices(self, data_labels):
        """
        extract vertices info from txt lines
        Input:
          lines   : list of string info
        Output:
          vertices: vertices of text regions <numpy.ndarray, (n,8)>
          labels  : 1->valid, 0->ignore, <numpy.ndarray, (n,)>
        """
        vertices_list = []
        labels_list = []
        data_labels = eval(data_labels)
        for data_label in data_labels:
            vertices = data_label["points"]
            vertices = [item for point in vertices for item in point]
            vertices_list.append(vertices)
            labels = 0 if data_label["transcription"] == "###" else 1
            labels_list.append(labels)
        return np.array(vertices_list), np.array(labels_list)
mindocr.data.transforms.det_transforms

transforms for text detection tasks.

mindocr.data.transforms.det_transforms.BorderMap
Source code in mindocr\data\transforms\det_transforms.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
class BorderMap:
    def __init__(self, shrink_ratio=0.4, thresh_min=0.3, thresh_max=0.7, **kwargs):
        self._thresh_min = thresh_min
        self._thresh_max = thresh_max
        self._dist_coef = 1 - shrink_ratio**2

    def __call__(self, data):
        border = np.zeros(data["image"].shape[:2], dtype=np.float32)
        mask = np.zeros(data["image"].shape[:2], dtype=np.float32)

        for i in range(len(data["polys"])):
            if not data["ignore_tags"][i]:
                self._draw_border(data["polys"][i], border, mask=mask)
        border = border * (self._thresh_max - self._thresh_min) + self._thresh_min

        data["thresh_map"] = border
        data["thresh_mask"] = mask
        return data

    def _draw_border(self, np_poly, border, mask):
        # draw mask
        poly = Polygon(np_poly)
        distance = self._dist_coef * poly.area / poly.length
        padded_polygon = np.array(expand_poly(np_poly, distance)[0], dtype=np.int32)
        cv2.fillPoly(mask, [padded_polygon], 1.0)

        # draw border
        min_vals, max_vals = np.min(padded_polygon, axis=0), np.max(padded_polygon, axis=0)
        width, height = max_vals - min_vals + 1
        np_poly = np_poly - min_vals

        xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
        ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))

        distance_map = [self._distance(xs, ys, p1, p2) for p1, p2 in zip(np_poly, np.roll(np_poly, 1, axis=0))]
        distance_map = np.clip(np.array(distance_map, dtype=np.float32) / distance, 0, 1).min(axis=0)  # NOQA

        min_valid = np.clip(min_vals, 0, np.array(border.shape[::-1]) - 1)  # shape reverse order: w, h
        max_valid = np.clip(max_vals, 0, np.array(border.shape[::-1]) - 1)

        border[min_valid[1] : max_valid[1] + 1, min_valid[0] : max_valid[0] + 1] = np.fmax(
            1
            - distance_map[
                min_valid[1] - min_vals[1] : max_valid[1] - max_vals[1] + height,
                min_valid[0] - min_vals[0] : max_valid[0] - max_vals[0] + width,
            ],
            border[min_valid[1] : max_valid[1] + 1, min_valid[0] : max_valid[0] + 1],
        )

    @staticmethod
    def _distance(xs, ys, point_1, point_2):
        """
        compute the distance from each point to a line
        ys: coordinates in the first axis
        xs: coordinates in the second axis
        point_1, point_2: (x, y), the end of the line
        """
        a_sq = np.square(xs - point_1[0]) + np.square(ys - point_1[1])
        b_sq = np.square(xs - point_2[0]) + np.square(ys - point_2[1])
        c_sq = np.square(point_1[0] - point_2[0]) + np.square(point_1[1] - point_2[1])

        cos = (a_sq + b_sq - c_sq) / (2 * np.sqrt(a_sq * b_sq))
        sin_sq = np.nan_to_num(1 - np.square(cos))
        result = np.sqrt(a_sq * b_sq * sin_sq / c_sq)

        result[cos >= 0] = np.sqrt(np.fmin(a_sq, b_sq))[cos >= 0]
        return result
mindocr.data.transforms.det_transforms.DetLabelEncode
Source code in mindocr\data\transforms\det_transforms.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
class DetLabelEncode:
    def __init__(self, **kwargs):
        pass

    def order_points_clockwise(self, pts):
        rect = np.zeros((4, 2), dtype="float32")
        s = pts.sum(axis=1)
        rect[0] = pts[np.argmin(s)]
        rect[2] = pts[np.argmax(s)]
        tmp = np.delete(pts, (np.argmin(s), np.argmax(s)), axis=0)
        diff = np.diff(np.array(tmp), axis=1)
        rect[1] = tmp[np.argmin(diff)]
        rect[3] = tmp[np.argmax(diff)]
        return rect

    def expand_points_num(self, boxes):
        max_points_num = 0
        for b in boxes:
            if len(b) > max_points_num:
                max_points_num = len(b)
        ex_boxes = []
        for b in boxes:
            ex_box = b + [b[-1]] * (max_points_num - len(b))
            ex_boxes.append(ex_box)
        return ex_boxes

    def __call__(self, data):
        """
        required keys:
            label (str): string containgin points and transcription in json format
        added keys:
            polys (np.ndarray): polygon boxes in an image, each polygon is represented by points
                            in shape [num_polygons, num_points, 2]
            texts (List(str)): text string
            ignore_tags (np.ndarray[bool]): indicators for ignorable texts (e.g., '###')
        """
        label = data["label"]
        label = json.loads(label)
        nBox = len(label)
        boxes, txts, txt_tags = [], [], []
        for bno in range(0, nBox):
            box = label[bno]["points"]
            txt = label[bno]["transcription"]
            boxes.append(box)
            txts.append(txt)
            if txt in ["*", "###"]:
                txt_tags.append(True)
            else:
                txt_tags.append(False)
        if len(boxes) == 0:
            return None
        boxes = self.expand_points_num(boxes)
        boxes = np.array(boxes, dtype=np.float32)
        txt_tags = np.array(txt_tags, dtype=np.bool)

        data["polys"] = boxes
        data["texts"] = txts
        data["ignore_tags"] = txt_tags
        return data
mindocr.data.transforms.det_transforms.DetLabelEncode.__call__(data)
required keys

label (str): string containgin points and transcription in json format

added keys

polys (np.ndarray): polygon boxes in an image, each polygon is represented by points in shape [num_polygons, num_points, 2] texts (List(str)): text string ignore_tags (np.ndarray[bool]): indicators for ignorable texts (e.g., '###')

Source code in mindocr\data\transforms\det_transforms.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def __call__(self, data):
    """
    required keys:
        label (str): string containgin points and transcription in json format
    added keys:
        polys (np.ndarray): polygon boxes in an image, each polygon is represented by points
                        in shape [num_polygons, num_points, 2]
        texts (List(str)): text string
        ignore_tags (np.ndarray[bool]): indicators for ignorable texts (e.g., '###')
    """
    label = data["label"]
    label = json.loads(label)
    nBox = len(label)
    boxes, txts, txt_tags = [], [], []
    for bno in range(0, nBox):
        box = label[bno]["points"]
        txt = label[bno]["transcription"]
        boxes.append(box)
        txts.append(txt)
        if txt in ["*", "###"]:
            txt_tags.append(True)
        else:
            txt_tags.append(False)
    if len(boxes) == 0:
        return None
    boxes = self.expand_points_num(boxes)
    boxes = np.array(boxes, dtype=np.float32)
    txt_tags = np.array(txt_tags, dtype=np.bool)

    data["polys"] = boxes
    data["texts"] = txts
    data["ignore_tags"] = txt_tags
    return data
mindocr.data.transforms.det_transforms.DetResize

Resize the image and text polygons (if have) for text detection

PARAMETER DESCRIPTION
target_size

target size [H, W] of the output image. If it is not None, limit_type will be forced to None and side limit-based resizng will not make effect. Default: None.

TYPE: list DEFAULT: None

keep_ratio

whether to keep aspect ratio. Default: True

DEFAULT: True

padding

whether to pad the image to the target_size after "keep-ratio" resizing. Only used when keep_ratio is True. Default False.

DEFAULT: False

limit_type

it decides the resize method type. Option: 'min', 'max', None. Default: "min" - 'min': images will be resized by limiting the mininum side length to limit_side_len, i.e., any side of the image must be larger than or equal to limit_side_len. If the input image alreay fulfill this limitation, no scaling will performed. If not, input image will be up-scaled with the ratio of (limit_side_len / shorter side length) - 'max': images will be resized by limiting the maximum side length to limit_side_len, i.e., any side of the image must be smaller than or equal to limit_side_len. If the input image alreay fulfill this limitation, no scaling will performed. If not, input image will be down-scaled with the ratio of (limit_side_len / longer side length) - None: No limitation. Images will be resized to target_size with or without keep_ratio and padding

DEFAULT: 'min'

limit_side_len

side len limitation.

DEFAULT: 736

force_divisable

whether to force the image being resize to a size multiple of divisor (e.g. 32) in the end, which is suitable for some networks (e.g. dbnet-resnet50). Default: True.

DEFAULT: True

divisor

divisor used when force_divisable enabled. The value is decided by the down-scaling path of the network backbone (e.g. resnet, feature map size is 2^5 smaller than input image size). Default is 32.

DEFAULT: 32

interpoloation

interpolation method

Note
  1. The default choices limit_type=min, with large limit_side_len are recommended for inference in detection for better accuracy,
  2. If target_size set, keep_ratio=True, limit_type=null, padding=True, this transform works the same as ScalePadImage,
  3. If inference speed is the first priority to guarante, you can set limit_type=max with a small limit_side_len like 960.
Source code in mindocr\data\transforms\det_transforms.py
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
class DetResize:
    """
    Resize the image and text polygons (if have) for text detection

    Args:
        target_size: target size [H, W] of the output image. If it is not None, `limit_type` will be forced to None and
            side limit-based resizng will not make effect. Default: None.
        keep_ratio: whether to keep aspect ratio. Default: True
        padding: whether to pad the image to the `target_size` after "keep-ratio" resizing. Only used when keep_ratio is
            True. Default False.
        limit_type: it decides the resize method type. Option: 'min', 'max', None. Default: "min"
            - 'min': images will be resized by limiting the mininum side length to `limit_side_len`, i.e.,
              any side of the image must be larger than or equal to `limit_side_len`. If the input image alreay fulfill
            this limitation, no scaling will performed. If not, input image will be up-scaled with the ratio of
            (limit_side_len / shorter side length)
            - 'max': images will be resized by limiting the maximum side length to `limit_side_len`, i.e.,
              any side of the image must be smaller than or equal to `limit_side_len`. If the input image alreay fulfill
              this limitation, no scaling will performed. If not, input image will be down-scaled with the ratio of
              (limit_side_len / longer side length)
            -  None: No limitation. Images will be resized to `target_size` with or without `keep_ratio` and `padding`
        limit_side_len: side len limitation.
        force_divisable: whether to force the image being resize to a size multiple of `divisor` (e.g. 32) in the end,
            which is suitable for some networks (e.g. dbnet-resnet50). Default: True.
        divisor: divisor used when `force_divisable` enabled. The value is decided by the down-scaling path of
            the network backbone (e.g. resnet, feature map size is 2^5 smaller than input image size). Default is 32.
        interpoloation: interpolation method

    Note:
        1. The default choices limit_type=min, with large `limit_side_len` are recommended for inference in detection
        for better accuracy,
        2. If target_size set, keep_ratio=True, limit_type=null, padding=True, this transform works the same as
        ScalePadImage,
        3. If inference speed is the first priority to guarante, you can set limit_type=max with a small
        `limit_side_len` like 960.
    """

    def __init__(
        self,
        target_size: list = None,
        keep_ratio=True,
        padding=False,
        limit_type="min",
        limit_side_len=736,
        force_divisable=True,
        divisor=32,
        interpolation=cv2.INTER_LINEAR,
        **kwargs,
    ):
        if target_size is not None:
            limit_type = None

        self.target_size = target_size
        self.keep_ratio = keep_ratio
        self.padding = padding
        self.limit_side_len = limit_side_len
        self.limit_type = limit_type
        self.interpolation = interpolation
        self.force_divisable = force_divisable
        self.divisor = divisor

        self.is_train = kwargs.get("is_train", False)
        assert target_size is None or limit_type is None, "Only one of limit_type and target_size should be provided."
        if limit_type in ["min", "max"]:
            keep_ratio = True
            padding = False
            print(
                f"INFO: `limit_type` is {limit_type}. Image will be resized by limiting the {limit_type} "
                f"side length to {limit_side_len}."
            )
        elif not limit_type:
            assert target_size is not None or force_divisable is not None, (
                "One of `target_size` or `force_divisable` is required when limit_type is not set. "
                "Please set at least one of them."
            )
            if target_size and force_divisable:
                if (target_size[0] % divisor != 0) or (target_size[1] % divisor != 0):
                    self.target_size = [max(round(x / self.divisor) * self.divisor, self.divisor) for x in target_size]
                    print(
                        f"WARNING: `force_divisable` is enabled but the set target size {target_size} "
                        f"is not divisable by {divisor}. Target size is ajusted to {self.target_size}"
                    )
            if (target_size is not None) and keep_ratio and (not padding):
                print("WARNING: output shape can be dynamic if keep_ratio but no padding.")
        else:
            raise ValueError(f"Unknown limit_type: {limit_type}")

    def __call__(self, data: dict):
        """
        required keys:
            image: shape HWC
            polys: shape [num_polys, num_points, 2] (optional)
        modified keys:
            image
            (polys)
        added keys:
            shape: [src_h, src_w, scale_ratio_h, scale_ratio_w]
        """
        img = data["image"]
        h, w = img.shape[:2]
        if self.target_size:
            tar_h, tar_w = self.target_size

        scale_ratio = 1.0
        allow_padding = False
        if self.limit_type == "min":
            if min(h, w) < self.limit_side_len:  # upscale
                scale_ratio = self.limit_side_len / float(min(h, w))
        elif self.limit_type == "max":
            if max(h, w) > self.limit_side_len:  # downscale
                scale_ratio = self.limit_side_len / float(max(h, w))
        elif not self.limit_type:
            if self.keep_ratio and self.target_size:
                # scale the image until it fits in the target size at most. The left part could be filled by padding.
                scale_ratio = min(tar_h / h, tar_w / w)
                allow_padding = True

        if (self.limit_type in ["min", "max"]) or (self.target_size and self.keep_ratio):
            resize_w = math.ceil(w * scale_ratio)
            resize_h = math.ceil(h * scale_ratio)
            if self.target_size:
                resize_w = min(resize_w, tar_w)
                resize_h = min(resize_h, tar_h)
        elif self.target_size:
            resize_w = tar_w
            resize_h = tar_h
        else:  # both target_size and limit_type is None. resize by force_divisable
            resize_w = w
            resize_h = h

        if self.force_divisable:
            if not (
                allow_padding and self.padding
            ):  # no need to round it the image will be padded to the target size which is divisable.
                # adjust the size slightly so that both sides of the image are divisable by divisor
                # e.g. 32, which could be required by the network
                resize_h = max(
                    math.ceil(resize_h / self.divisor) * self.divisor, self.divisor
                )  # diff from resize_image_type0 in pp which uses round()
                resize_w = max(math.ceil(resize_w / self.divisor) * self.divisor, self.divisor)

        resized_img = cv2.resize(img, (resize_w, resize_h), interpolation=self.interpolation)

        if allow_padding and self.padding:
            if self.target_size and (tar_h >= resize_h and tar_w >= resize_w):
                # do padding
                padded_img = np.zeros((tar_h, tar_w, 3), dtype=np.uint8)
                padded_img[:resize_h, :resize_w, :] = resized_img
                data["image"] = padded_img
            else:
                print(
                    f"WARNING: Image shape after resize is ({resize_h}, {resize_w}), "
                    f"which is larger than target_size {self.target_size}. Skip padding for the current image. "
                    f"You may disable `force_divisable` to avoid this warning."
                )
        else:
            data["image"] = resized_img

        scale_h = resize_h / h
        scale_w = resize_w / w

        # Only need to transform ground truth polygons in training for generating masks/maps.
        # For evaluation, we should not change the GT polygons.
        # The metric with input of GT polygons and predicted polygons must be computed in the original image space
        # for consistent comparison.
        if "polys" in data and self.is_train:
            data["polys"][:, :, 0] = data["polys"][:, :, 0] * scale_w
            data["polys"][:, :, 1] = data["polys"][:, :, 1] * scale_h

        if "shape_list" not in data:
            src_h, src_w = data.get("raw_img_shape", (h, w))
            data["shape_list"] = np.array([src_h, src_w, scale_h, scale_w], dtype=np.float32)
        else:
            data["shape_list"][2] = data["shape_list"][2] * scale_h
            data["shape_list"][3] = data["shape_list"][3] * scale_h

        return data
mindocr.data.transforms.det_transforms.DetResize.__call__(data)
required keys
modified keys

image (polys)

added keys
Source code in mindocr\data\transforms\det_transforms.py
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
def __call__(self, data: dict):
    """
    required keys:
        image: shape HWC
        polys: shape [num_polys, num_points, 2] (optional)
    modified keys:
        image
        (polys)
    added keys:
        shape: [src_h, src_w, scale_ratio_h, scale_ratio_w]
    """
    img = data["image"]
    h, w = img.shape[:2]
    if self.target_size:
        tar_h, tar_w = self.target_size

    scale_ratio = 1.0
    allow_padding = False
    if self.limit_type == "min":
        if min(h, w) < self.limit_side_len:  # upscale
            scale_ratio = self.limit_side_len / float(min(h, w))
    elif self.limit_type == "max":
        if max(h, w) > self.limit_side_len:  # downscale
            scale_ratio = self.limit_side_len / float(max(h, w))
    elif not self.limit_type:
        if self.keep_ratio and self.target_size:
            # scale the image until it fits in the target size at most. The left part could be filled by padding.
            scale_ratio = min(tar_h / h, tar_w / w)
            allow_padding = True

    if (self.limit_type in ["min", "max"]) or (self.target_size and self.keep_ratio):
        resize_w = math.ceil(w * scale_ratio)
        resize_h = math.ceil(h * scale_ratio)
        if self.target_size:
            resize_w = min(resize_w, tar_w)
            resize_h = min(resize_h, tar_h)
    elif self.target_size:
        resize_w = tar_w
        resize_h = tar_h
    else:  # both target_size and limit_type is None. resize by force_divisable
        resize_w = w
        resize_h = h

    if self.force_divisable:
        if not (
            allow_padding and self.padding
        ):  # no need to round it the image will be padded to the target size which is divisable.
            # adjust the size slightly so that both sides of the image are divisable by divisor
            # e.g. 32, which could be required by the network
            resize_h = max(
                math.ceil(resize_h / self.divisor) * self.divisor, self.divisor
            )  # diff from resize_image_type0 in pp which uses round()
            resize_w = max(math.ceil(resize_w / self.divisor) * self.divisor, self.divisor)

    resized_img = cv2.resize(img, (resize_w, resize_h), interpolation=self.interpolation)

    if allow_padding and self.padding:
        if self.target_size and (tar_h >= resize_h and tar_w >= resize_w):
            # do padding
            padded_img = np.zeros((tar_h, tar_w, 3), dtype=np.uint8)
            padded_img[:resize_h, :resize_w, :] = resized_img
            data["image"] = padded_img
        else:
            print(
                f"WARNING: Image shape after resize is ({resize_h}, {resize_w}), "
                f"which is larger than target_size {self.target_size}. Skip padding for the current image. "
                f"You may disable `force_divisable` to avoid this warning."
            )
    else:
        data["image"] = resized_img

    scale_h = resize_h / h
    scale_w = resize_w / w

    # Only need to transform ground truth polygons in training for generating masks/maps.
    # For evaluation, we should not change the GT polygons.
    # The metric with input of GT polygons and predicted polygons must be computed in the original image space
    # for consistent comparison.
    if "polys" in data and self.is_train:
        data["polys"][:, :, 0] = data["polys"][:, :, 0] * scale_w
        data["polys"][:, :, 1] = data["polys"][:, :, 1] * scale_h

    if "shape_list" not in data:
        src_h, src_w = data.get("raw_img_shape", (h, w))
        data["shape_list"] = np.array([src_h, src_w, scale_h, scale_w], dtype=np.float32)
    else:
        data["shape_list"][2] = data["shape_list"][2] * scale_h
        data["shape_list"][3] = data["shape_list"][3] * scale_h

    return data
mindocr.data.transforms.det_transforms.GridResize

Bases: DetResize

Resize image to make it divisible by a specified factor exactly. Resize polygons correspondingly, if provided.

Source code in mindocr\data\transforms\det_transforms.py
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
class GridResize(DetResize):
    """
    Resize image to make it divisible by a specified factor exactly.
    Resize polygons correspondingly, if provided.
    """

    def __init__(self, factor: int = 32, **kwargs):
        super().__init__(
            target_size=None,
            keep_ratio=False,
            padding=False,
            limit_type=None,
            force_divisable=True,
            divisor=factor,
        )
mindocr.data.transforms.det_transforms.RandomCropWithBBox

Randomly cuts a crop from an image along with polygons in the way that the crop doesn't intersect any polygons (i.e. any given polygon is either fully inside or fully outside the crop).

PARAMETER DESCRIPTION
max_tries

number of attempts to try to cut a crop with a polygon in it. If fails, scales the whole image to match the crop_size.

DEFAULT: 10

min_crop_ratio

minimum size of a crop in respect to an input image size.

DEFAULT: 0.1

crop_size

target size of the crop (resized and padded, if needed), preserves sides ratio.

DEFAULT: (640, 640)

p

probability of the augmentation being applied to an image.

TYPE: float DEFAULT: 0.5

Source code in mindocr\data\transforms\det_transforms.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
class RandomCropWithBBox:
    """
    Randomly cuts a crop from an image along with polygons in the way that the crop doesn't intersect any polygons
    (i.e. any given polygon is either fully inside or fully outside the crop).

    Args:
        max_tries: number of attempts to try to cut a crop with a polygon in it. If fails, scales the whole image to
                   match the `crop_size`.
        min_crop_ratio: minimum size of a crop in respect to an input image size.
        crop_size: target size of the crop (resized and padded, if needed), preserves sides ratio.
        p: probability of the augmentation being applied to an image.
    """

    def __init__(self, max_tries=10, min_crop_ratio=0.1, crop_size=(640, 640), p: float = 0.5, **kwargs):
        self._crop_size = crop_size
        self._ratio = min_crop_ratio
        self._max_tries = max_tries
        self._p = p

    def __call__(self, data):
        if random.random() < self._p:  # cut a crop
            start, end = self._find_crop(data)
        else:  # scale and pad the whole image
            start, end = np.array([0, 0]), np.array(data["image"].shape[:2])

        scale = min(self._crop_size / (end - start))

        data["image"] = cv2.resize(data["image"][start[0] : end[0], start[1] : end[1]], None, fx=scale, fy=scale)
        data["actual_size"] = np.array(data["image"].shape[:2])
        data["image"] = np.pad(
            data["image"], (*tuple((0, cs - ds) for cs, ds in zip(self._crop_size, data["image"].shape[:2])), (0, 0))
        )

        data["polys"] = (data["polys"] - start[::-1]) * scale

        return data

    def _find_crop(self, data):
        size = np.array(data["image"].shape[:2])
        polys = [poly for poly, ignore in zip(data["polys"], data["ignore_tags"]) if not ignore]

        if polys:
            # do not crop through polys => find available "empty" coordinates
            h_array, w_array = np.zeros(size[0], dtype=np.int32), np.zeros(size[1], dtype=np.int32)
            for poly in polys:
                points = np.maximum(np.round(poly).astype(np.int32), 0)
                w_array[points[:, 0].min() : points[:, 0].max() + 1] = 1
                h_array[points[:, 1].min() : points[:, 1].max() + 1] = 1

            if not h_array.all() and not w_array.all():  # if texts do not occupy full image
                # find available coordinates that don't include text
                h_avail = np.where(h_array == 0)[0]
                w_avail = np.where(w_array == 0)[0]

                min_size = np.ceil(size * self._ratio).astype(np.int32)
                for _ in range(self._max_tries):
                    y = np.sort(np.random.choice(h_avail, size=2))
                    x = np.sort(np.random.choice(w_avail, size=2))
                    start, end = np.array([y[0], x[0]]), np.array([y[1], x[1]])

                    if ((end - start) < min_size).any():  # NOQA
                        continue

                    # check that at least one polygon is within the crop
                    for poly in polys:
                        if (poly.max(axis=0) > start[::-1]).all() and (poly.min(axis=0) < end[::-1]).all():  # NOQA
                            return start, end

        # failed to generate a crop or all polys are marked as ignored
        return np.array([0, 0]), size
mindocr.data.transforms.det_transforms.ScalePadImage

Bases: DetResize

Scale image and polys by the shorter side, then pad to the target_size. input image format: hwc

PARAMETER DESCRIPTION
target_size

[H, W] of the output image.

TYPE: list

Source code in mindocr\data\transforms\det_transforms.py
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
class ScalePadImage(DetResize):
    """
    Scale image and polys by the shorter side, then pad to the target_size.
    input image format: hwc

    Args:
        target_size: [H, W] of the output image.
    """

    def __init__(self, target_size: list, **kwargs):
        super().__init__(
            target_size=target_size,
            keep_ratio=True,
            padding=True,
            limit_type=None,
            force_divisable=False,
        )
mindocr.data.transforms.det_transforms.ShrinkBinaryMap

Making binary mask from detection data with ICDAR format. Typically following the process of class MakeICDARData.

Source code in mindocr\data\transforms\det_transforms.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
class ShrinkBinaryMap:
    """
    Making binary mask from detection data with ICDAR format.
    Typically following the process of class `MakeICDARData`.
    """

    def __init__(self, min_text_size=8, shrink_ratio=0.4, **kwargs):
        self._min_text_size = min_text_size
        self._dist_coef = 1 - shrink_ratio**2

    def __call__(self, data):
        gt = np.zeros(data["image"].shape[:2], dtype=np.float32)
        mask = np.ones(data["image"].shape[:2], dtype=np.float32)

        if len(data["polys"]):
            for i in range(len(data["polys"])):
                min_side = min(np.max(data["polys"][i], axis=0) - np.min(data["polys"][i], axis=0))

                if data["ignore_tags"][i] or min_side < self._min_text_size:
                    cv2.fillPoly(mask, [data["polys"][i].astype(np.int32)], 0)
                    data["ignore_tags"][i] = True
                else:
                    poly = Polygon(data["polys"][i])
                    shrunk = expand_poly(data["polys"][i], distance=-self._dist_coef * poly.area / poly.length)

                    if shrunk:
                        cv2.fillPoly(gt, [np.array(shrunk[0], dtype=np.int32)], 1)
                    else:
                        cv2.fillPoly(mask, [data["polys"][i].astype(np.int32)], 0)
                        data["ignore_tags"][i] = True

        data["binary_map"] = np.expand_dims(gt, axis=0)
        data["mask"] = mask
        return data
mindocr.data.transforms.det_transforms.ValidatePolygons
Validate polygons by
  1. filtering out polygons outside an image.
  2. clipping coordinates of polygons that are partially outside an image to stay within the visible region.
PARAMETER DESCRIPTION
min_area

minimum area below which newly clipped polygons considered as ignored.

TYPE: float DEFAULT: 1.0

clip_to_visible_area

(Experimental) clip polygons to a visible area. Number of vertices in a polygon after clipping may change.

TYPE: bool DEFAULT: False

min_vertices

minimum number of vertices in a polygon below which newly clipped polygons considered as ignored.

TYPE: int DEFAULT: 4

Source code in mindocr\data\transforms\det_transforms.py
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
class ValidatePolygons:
    """
    Validate polygons by:
     1. filtering out polygons outside an image.
     2. clipping coordinates of polygons that are partially outside an image to stay within the visible region.
    Args:
        min_area: minimum area below which newly clipped polygons considered as ignored.
        clip_to_visible_area: (Experimental) clip polygons to a visible area. Number of vertices in a polygon after
                              clipping may change.
        min_vertices: minimum number of vertices in a polygon below which newly clipped polygons considered as ignored.
    """

    def __init__(self, min_area: float = 1.0, clip_to_visible_area: bool = False, min_vertices: int = 4, **kwargs):
        self._min_area = min_area
        self._min_vert = min_vertices
        self._clip = clip_to_visible_area

    def __call__(self, data: dict):
        size = data.get("actual_size", np.array(data["image"].shape[:2]))[::-1]  # convert to x, y coord
        border = box(0, 0, *size)

        new_polys, new_texts, new_tags = [], [], []
        for np_poly, text, ignore in zip(data["polys"], data["texts"], data["ignore_tags"]):
            poly = Polygon(np_poly)
            if poly.intersects(border):  # if the polygon is fully or partially within the image
                poly = poly.intersection(border)
                if poly.area < self._min_area:
                    ignore = True

                if self._clip:  # Clip polygon to a visible area
                    poly = poly.exterior.coords
                    np_poly = np.array(poly[:-1])
                    if len(np_poly) < self._min_vert:
                        ignore = True

                new_polys.append(np_poly)
                new_tags.append(ignore)
                new_texts.append(text)

        data["polys"] = new_polys
        data["texts"] = new_texts
        data["ignore_tags"] = np.array(new_tags)

        return data
mindocr.data.transforms.general_transforms
mindocr.data.transforms.general_transforms.DecodeImage

img_mode (str): The channel order of the output, 'BGR' and 'RGB'. Default to 'BGR'. channel_first (bool): if True, image shpae is CHW. If False, HWC. Default to False

Source code in mindocr\data\transforms\general_transforms.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class DecodeImage:
    """
    img_mode (str): The channel order of the output, 'BGR' and 'RGB'. Default to 'BGR'.
    channel_first (bool): if True, image shpae is CHW. If False, HWC. Default to False
    """

    def __init__(
        self, img_mode="BGR", channel_first=False, to_float32=False, ignore_orientation=False, keep_ori=False, **kwargs
    ):
        self.img_mode = img_mode
        self.to_float32 = to_float32
        self.channel_first = channel_first
        self.flag = cv2.IMREAD_IGNORE_ORIENTATION | cv2.IMREAD_COLOR if ignore_orientation else cv2.IMREAD_COLOR
        self.keep_ori = keep_ori

    def __call__(self, data):
        if "img_path" in data:
            with open(data["img_path"], "rb") as f:
                img = f.read()
        elif "img_lmdb" in data:
            img = data["img_lmdb"]
        img = np.frombuffer(img, dtype="uint8")
        img = cv2.imdecode(img, self.flag)

        if self.img_mode == "RGB":
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        if self.channel_first:
            img = img.transpose((2, 0, 1))

        if self.to_float32:
            img = img.astype("float32")
        data["image"] = img
        # data['ori_image'] = img.copy()
        data["raw_img_shape"] = img.shape[:2]

        if self.keep_ori:
            data["image_ori"] = img.copy()

        return data
mindocr.data.transforms.general_transforms.NormalizeImage

normalize image, subtract mean, divide std input image: by default, np.uint8, [0, 255], HWC format. return image: float32 numpy array

Source code in mindocr\data\transforms\general_transforms.py
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
class NormalizeImage:
    """
    normalize image, subtract mean, divide std
    input image: by default, np.uint8, [0, 255], HWC format.
    return image: float32 numpy array
    """

    def __init__(
        self,
        mean: Union[List[float], str] = "imagenet",
        std: Union[List[float], str] = "imagenet",
        is_hwc=True,
        bgr_to_rgb=False,
        rgb_to_bgr=False,
        **kwargs,
    ):
        # By default, imagnet MEAN and STD is in RGB order. inverse if input image is in BGR mode
        self._channel_conversion = False
        if bgr_to_rgb or rgb_to_bgr:
            self._channel_conversion = True

        # TODO: detect hwc or chw automatically
        shape = (3, 1, 1) if not is_hwc else (1, 1, 3)
        self.mean = np.array(self._get_value(mean, "mean")).reshape(shape).astype("float32")
        self.std = np.array(self._get_value(std, "std")).reshape(shape).astype("float32")
        self.is_hwc = is_hwc

    def __call__(self, data):
        img = data["image"]
        if isinstance(img, Image.Image):
            img = np.array(img)
        assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"

        if self._channel_conversion:
            if self.is_hwc:
                img = img[..., [2, 1, 0]]
            else:
                img = img[[2, 1, 0], ...]

        data["image"] = (img.astype("float32") - self.mean) / self.std
        return data

    @staticmethod
    def _get_value(val, name):
        if isinstance(val, str) and val.lower() == "imagenet":
            assert name in ["mean", "std"]
            return IMAGENET_DEFAULT_MEAN if name == "mean" else IMAGENET_DEFAULT_STD
        elif isinstance(val, list):
            return val
        else:
            raise ValueError(f"Wrong {name} value: {val}")
mindocr.data.transforms.general_transforms.PackLoaderInputs
PARAMETER DESCRIPTION
output_columns

the keys in data dict that are expected to output for dataloader

TYPE: list

Call
Source code in mindocr\data\transforms\general_transforms.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
class PackLoaderInputs:
    """
    Args:
        output_columns (list): the keys in data dict that are expected to output for dataloader

    Call:
        input: data dict
        output: data tuple corresponding to the `output_columns`
    """

    def __init__(self, output_columns: List, **kwargs):
        self.output_columns = output_columns

    def __call__(self, data):
        out = []
        for k in self.output_columns:
            assert k in data, f"key {k} does not exists in data, availabe keys are {data.keys()}"
            out.append(data[k])

        return tuple(out)
mindocr.data.transforms.general_transforms.RandomColorAdjust
Source code in mindocr\data\transforms\general_transforms.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
class RandomColorAdjust:
    def __init__(self, brightness=32.0 / 255, saturation=0.5, **kwargs):
        contrast = kwargs.get("contrast", (1, 1))
        hue = kwargs.get("hue", (0, 0))
        self._jitter = MSRandomColorAdjust(brightness=brightness, saturation=saturation, contrast=contrast, hue=hue)
        self._pil = ToPIL()

    def __call__(self, data):
        """
        required keys: image
        modified keys: image
        """
        # there's a bug in MindSpore that requires images to be converted to the PIL format first
        data["image"] = np.array(self._jitter(self._pil(data["image"])))
        return data
mindocr.data.transforms.general_transforms.RandomColorAdjust.__call__(data)

required keys: image modified keys: image

Source code in mindocr\data\transforms\general_transforms.py
193
194
195
196
197
198
199
200
def __call__(self, data):
    """
    required keys: image
    modified keys: image
    """
    # there's a bug in MindSpore that requires images to be converted to the PIL format first
    data["image"] = np.array(self._jitter(self._pil(data["image"])))
    return data
mindocr.data.transforms.general_transforms.RandomHorizontalFlip

Random horizontal flip of an image with polygons in it (if any).

PARAMETER DESCRIPTION
p

probability of the augmentation being applied to an image.

TYPE: float DEFAULT: 0.5

Source code in mindocr\data\transforms\general_transforms.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
class RandomHorizontalFlip:
    """
    Random horizontal flip of an image with polygons in it (if any).
    Args:
        p: probability of the augmentation being applied to an image.
    """

    def __init__(self, p: float = 0.5, **kwargs):
        self._p = p

    def __call__(self, data: dict) -> dict:
        if random.random() < self._p:
            data["image"] = cv2.flip(data["image"], 1)

            if "polys" in data:
                mat = np.float32([[-1, 0, data["image"].shape[1] - 1], [0, 1, 0]])
                data["polys"] = cv2.transform(data["polys"], mat)
                # TODO: assign a new starting point located in the top left
                data["polys"] = data["polys"][:, ::-1, :]  # preserve the original order (e.g. clockwise)

        return data
mindocr.data.transforms.general_transforms.RandomRotate

Randomly rotate an image with polygons in it (if any).

PARAMETER DESCRIPTION
degrees

range of angles [min, max]

DEFAULT: (-10, 10)

expand_canvas

whether to expand canvas during rotation (the image size will be increased) or maintain the original size (the rotated image will be cropped back to the original size).

DEFAULT: True

p

probability of the augmentation being applied to an image.

TYPE: float DEFAULT: 1.0

Source code in mindocr\data\transforms\general_transforms.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
class RandomRotate:
    """
    Randomly rotate an image with polygons in it (if any).
    Args:
        degrees: range of angles [min, max]
        expand_canvas: whether to expand canvas during rotation (the image size will be increased) or
                       maintain the original size (the rotated image will be cropped back to the original size).
        p: probability of the augmentation being applied to an image.
    """

    def __init__(self, degrees=(-10, 10), expand_canvas=True, p: float = 1.0, **kwargs):
        self._degrees = degrees
        self._canvas = expand_canvas
        self._p = p

    def __call__(self, data: dict) -> dict:
        if random.random() < self._p:
            angle = random.randint(self._degrees[0], self._degrees[1])
            h, w = data["image"].shape[:2]

            center = w // 2, h // 2  # x, y
            mat = cv2.getRotationMatrix2D(center, angle, 1)

            if self._canvas:
                # compute the new bounding dimensions of the image
                cos, sin = np.abs(mat[0, 0]), np.abs(mat[0, 1])
                w, h = int((h * sin) + (w * cos)), int((h * cos) + (w * sin))

                # adjust the rotation matrix to take into account translation
                mat[0, 2] += (w / 2) - center[0]
                mat[1, 2] += (h / 2) - center[1]

            data["image"] = cv2.warpAffine(data["image"], mat, (w, h))

            if "polys" in data:
                data["polys"] = cv2.transform(data["polys"], mat)

        return data
mindocr.data.transforms.general_transforms.RandomScale

Randomly scales an image and its polygons in a predefined scale range.

PARAMETER DESCRIPTION
scale_range

(min, max) scale range.

TYPE: Union[tuple, list]

p

probability of the augmentation being applied to an image.

TYPE: float DEFAULT: 0.5

Source code in mindocr\data\transforms\general_transforms.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
class RandomScale:
    """
    Randomly scales an image and its polygons in a predefined scale range.
    Args:
        scale_range: (min, max) scale range.
        p: probability of the augmentation being applied to an image.
    """

    def __init__(self, scale_range: Union[tuple, list], p: float = 0.5, **kwargs):
        self._range = scale_range
        self._p = p
        assert kwargs.get("is_train", True), ValueError("RandomScale augmentation must be used for training only")

    def __call__(self, data: dict):
        """
        required keys:
            image, HWC
            (polys)
        modified keys:
            image
            (polys)
        """
        if random.random() < self._p:
            scale = np.random.uniform(*self._range)
            data["image"] = cv2.resize(data["image"], dsize=None, fx=scale, fy=scale)
            if "polys" in data:
                data["polys"] *= scale

        return data
mindocr.data.transforms.general_transforms.RandomScale.__call__(data)
required keys

image, HWC (polys)

modified keys

image (polys)

Source code in mindocr\data\transforms\general_transforms.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def __call__(self, data: dict):
    """
    required keys:
        image, HWC
        (polys)
    modified keys:
        image
        (polys)
    """
    if random.random() < self._p:
        scale = np.random.uniform(*self._range)
        data["image"] = cv2.resize(data["image"], dsize=None, fx=scale, fy=scale)
        if "polys" in data:
            data["polys"] *= scale

    return data
mindocr.data.transforms.rec_transforms

transform for text recognition tasks.

mindocr.data.transforms.rec_transforms.RecAttnLabelEncode
Source code in mindocr\data\transforms\rec_transforms.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
class RecAttnLabelEncode:
    def __init__(
        self,
        max_text_len: int = 25,
        character_dict_path: Optional[str] = None,
        use_space_char: bool = False,
        lower: bool = False,
        **kwargs,
    ) -> None:
        """
        Convert text label (str) to a sequence of character indices according to the char dictionary

        Args:
            max_text_len: to pad the label text to a fixed length (max_text_len) of text for attn loss computate.
            character_dict_path: path to dictionary, if None, a dictionary containing 36 chars
                (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
            use_space_char(bool): if True, add space char to the dict to recognize the space in between two words
            lower (bool): if True, all upper-case chars in the label text will be converted to lower case.
                Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to
                recognition both upper-case and lower-case.

        Attributes:
            go_idx: the index of the GO token
            stop_idx: the index of the STOP token
            num_valid_chars: the number of valid characters (including space char if used) in the dictionary
            num_classes: the number of classes (which valid characters char and the speical token for blank padding).
            so num_classes = num_valid_chars + 1
        """
        self.max_text_len = max_text_len
        self.lower = lower

        # read dict
        if character_dict_path is None:
            char_list = list("0123456789abcdefghijklmnopqrstuvwxyz")

            self.lower = True
            print("INFO: The character_dict_path is None, model can only recognize number and lower letters")
        else:
            # parse char dictionary
            char_list = []
            with open(character_dict_path, "r") as f:
                for line in f:
                    c = line.rstrip("\n\r")
                    char_list.append(c)

        # add space char if set
        if use_space_char:
            if " " not in char_list:
                char_list.append(" ")
            self.space_idx = len(char_list) + 1
        else:
            if " " in char_list:
                print(
                    "WARNING: The dict still contains space char in dict although use_space_char is set to be False, "
                    "because the space char is coded in the dictionary file ",
                    character_dict_path,
                )

        self.num_valid_chars = len(char_list)  # the number of valid chars (including space char if used)

        special_token = ["<GO>", "<STOP>"]
        char_list = special_token + char_list

        self.go_idx = 0
        self.stop_idx = 1

        self.dict = {c: idx for idx, c in enumerate(char_list)}

        self.num_classes = len(self.dict)

    def __call__(self, data: Dict[str, Any]) -> str:
        char_indices = str2idx(data["label"], self.dict, max_text_len=self.max_text_len, lower=self.lower)

        if char_indices is None:
            char_indices = []
        data["length"] = np.array(len(char_indices), dtype=np.int32)

        char_indices = (
            [self.go_idx] + char_indices + [self.stop_idx] + [self.go_idx] * (self.max_text_len - len(char_indices))
        )
        data["text_seq"] = np.array(char_indices, dtype=np.int32)

        data["text_length"] = len(data["label"])
        data["text_padded"] = data["label"] + " " * (self.max_text_len - len(data["label"]))
        return data
mindocr.data.transforms.rec_transforms.RecAttnLabelEncode.__init__(max_text_len=25, character_dict_path=None, use_space_char=False, lower=False, **kwargs)

Convert text label (str) to a sequence of character indices according to the char dictionary

PARAMETER DESCRIPTION
max_text_len

to pad the label text to a fixed length (max_text_len) of text for attn loss computate.

TYPE: int DEFAULT: 25

character_dict_path

path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.

TYPE: Optional[str] DEFAULT: None

use_space_char(bool)

if True, add space char to the dict to recognize the space in between two words

lower

if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.

TYPE: bool DEFAULT: False

ATTRIBUTE DESCRIPTION
go_idx

the index of the GO token

stop_idx

the index of the STOP token

num_valid_chars

the number of valid characters (including space char if used) in the dictionary

num_classes

the number of classes (which valid characters char and the speical token for blank padding).

Source code in mindocr\data\transforms\rec_transforms.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
def __init__(
    self,
    max_text_len: int = 25,
    character_dict_path: Optional[str] = None,
    use_space_char: bool = False,
    lower: bool = False,
    **kwargs,
) -> None:
    """
    Convert text label (str) to a sequence of character indices according to the char dictionary

    Args:
        max_text_len: to pad the label text to a fixed length (max_text_len) of text for attn loss computate.
        character_dict_path: path to dictionary, if None, a dictionary containing 36 chars
            (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
        use_space_char(bool): if True, add space char to the dict to recognize the space in between two words
        lower (bool): if True, all upper-case chars in the label text will be converted to lower case.
            Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to
            recognition both upper-case and lower-case.

    Attributes:
        go_idx: the index of the GO token
        stop_idx: the index of the STOP token
        num_valid_chars: the number of valid characters (including space char if used) in the dictionary
        num_classes: the number of classes (which valid characters char and the speical token for blank padding).
        so num_classes = num_valid_chars + 1
    """
    self.max_text_len = max_text_len
    self.lower = lower

    # read dict
    if character_dict_path is None:
        char_list = list("0123456789abcdefghijklmnopqrstuvwxyz")

        self.lower = True
        print("INFO: The character_dict_path is None, model can only recognize number and lower letters")
    else:
        # parse char dictionary
        char_list = []
        with open(character_dict_path, "r") as f:
            for line in f:
                c = line.rstrip("\n\r")
                char_list.append(c)

    # add space char if set
    if use_space_char:
        if " " not in char_list:
            char_list.append(" ")
        self.space_idx = len(char_list) + 1
    else:
        if " " in char_list:
            print(
                "WARNING: The dict still contains space char in dict although use_space_char is set to be False, "
                "because the space char is coded in the dictionary file ",
                character_dict_path,
            )

    self.num_valid_chars = len(char_list)  # the number of valid chars (including space char if used)

    special_token = ["<GO>", "<STOP>"]
    char_list = special_token + char_list

    self.go_idx = 0
    self.stop_idx = 1

    self.dict = {c: idx for idx, c in enumerate(char_list)}

    self.num_classes = len(self.dict)
mindocr.data.transforms.rec_transforms.RecCTCLabelEncode

Bases: object

Convert text label (str) to a sequence of character indices according to the char dictionary

PARAMETER DESCRIPTION
max_text_len

to pad the label text to a fixed length (max_text_len) of text for ctc loss computate.

DEFAULT: 23

character_dict_path

path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.

DEFAULT: None

use_space_char(bool)

if True, add space char to the dict to recognize the space in between two words

blank_at_last(bool)

padding with blank index (not the space index). If True, a blank/padding token will be appended to the end of the dictionary, so that blank_index = num_chars, where num_chars is the number of character in the dictionary including space char if used. If False, blank token will be inserted in the beginning of the dictionary, so blank_index=0.

lower

if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.

TYPE: bool DEFAULT: False

ATTRIBUTE DESCRIPTION
blank_idx

the index of the blank token for padding

num_valid_chars

the number of valid characters (including space char if used) in the dictionary

num_classes

the number of classes (which valid characters char and the speical token for blank padding). so num_classes = num_valid_chars + 1

Source code in mindocr\data\transforms\rec_transforms.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class RecCTCLabelEncode(object):
    """Convert text label (str) to a sequence of character indices according to the char dictionary

    Args:
        max_text_len: to pad the label text to a fixed length (max_text_len) of text for ctc loss computate.
        character_dict_path: path to dictionary, if None, a dictionary containing 36 chars
            (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
        use_space_char(bool): if True, add space char to the dict to recognize the space in between two words
        blank_at_last(bool): padding with blank index (not the space index). If True, a blank/padding token will be
            appended to the end of the dictionary, so that blank_index = num_chars, where num_chars is the number of
            character in the dictionary including space char if used. If False, blank token will be inserted in the
            beginning of the dictionary, so blank_index=0.
        lower (bool): if True, all upper-case chars in the label text will be converted to lower case.
            Set to be True if dictionary only contains lower-case chars.
            Set to be False if not and want to recognition both upper-case and lower-case.

    Attributes:
        blank_idx: the index of the blank token for padding
        num_valid_chars: the number of valid characters (including space char if used) in the dictionary
        num_classes: the number of classes (which valid characters char and the speical token for blank padding).
            so num_classes = num_valid_chars + 1


    """

    def __init__(
        self,
        max_text_len=23,
        character_dict_path=None,
        use_space_char=False,
        blank_at_last=True,
        lower=False,
        **kwargs,
        # start_token='<BOS>',
        # end_token='<EOS>',
        # unkown_token='',
    ):
        self.max_text_len = max_text_len
        self.space_idx = None
        self.lower = lower

        # read dict
        if character_dict_path is None:
            char_list = [c for c in "0123456789abcdefghijklmnopqrstuvwxyz"]

            self.lower = True
            # print("INFO: The character_dict_path is None, model can only recognize number and lower letters")
        else:
            # TODO: this is commonly used in other modules, wrap into a func or class.
            # parse char dictionary
            char_list = []
            with open(character_dict_path, "r") as f:
                for line in f:
                    c = line.rstrip("\n\r")
                    char_list.append(c)
        # add space char if set
        if use_space_char:
            if " " not in char_list:
                char_list.append(" ")
            self.space_idx = len(char_list) - 1
        else:
            if " " in char_list:
                print(
                    "WARNING: The dict still contains space char in dict although use_space_char is set to be False, "
                    "because the space char is coded in the dictionary file ",
                    character_dict_path,
                )

        self.num_valid_chars = len(char_list)  # the number of valid chars (including space char if used)

        # add blank token for padding
        if blank_at_last:
            # the index of a char in dict is [0, num_chars-1], blank index is set to num_chars
            char_list.append("<PAD>")
            self.blank_idx = self.num_valid_chars
        else:
            char_list = ["<PAD>"] + char_list
            self.blank_idx = 0

        self.dict = {c: idx for idx, c in enumerate(char_list)}

        self.num_classes = len(self.dict)

    def __call__(self, data: dict):
        """
        required keys:
            label -> (str) text string
        added keys:
            text_seq-> (np.ndarray, int32), sequence of character indices after  padding to max_text_len in shape
            (sequence_len), where ood characters are skipped
        added keys:
            length -> (np.int32) the number of valid chars in the encoded char index sequence,  where valid means
            the char is in dictionary.
            text_padded -> (str) text label padded to fixed length, to solved the dynamic shape issue in dataloader.
            text_length -> int, the length of original text string label

        """
        char_indices = str2idx(data["label"], self.dict, max_text_len=self.max_text_len, lower=self.lower)

        if char_indices is None:
            char_indices = []
            # return None
        data["length"] = np.array(len(char_indices), dtype=np.int32)
        # padding with blank index
        char_indices = char_indices + [self.blank_idx] * (self.max_text_len - len(char_indices))
        # TODO: raname to char_indices
        data["text_seq"] = np.array(char_indices, dtype=np.int32)
        #
        data["text_length"] = len(data["label"])
        data["text_padded"] = data["label"] + " " * (self.max_text_len - len(data["label"]))

        return data
mindocr.data.transforms.rec_transforms.RecCTCLabelEncode.__call__(data)
required keys

label -> (str) text string

added keys

text_seq-> (np.ndarray, int32), sequence of character indices after padding to max_text_len in shape (sequence_len), where ood characters are skipped

added keys

length -> (np.int32) the number of valid chars in the encoded char index sequence, where valid means the char is in dictionary. text_padded -> (str) text label padded to fixed length, to solved the dynamic shape issue in dataloader. text_length -> int, the length of original text string label

Source code in mindocr\data\transforms\rec_transforms.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def __call__(self, data: dict):
    """
    required keys:
        label -> (str) text string
    added keys:
        text_seq-> (np.ndarray, int32), sequence of character indices after  padding to max_text_len in shape
        (sequence_len), where ood characters are skipped
    added keys:
        length -> (np.int32) the number of valid chars in the encoded char index sequence,  where valid means
        the char is in dictionary.
        text_padded -> (str) text label padded to fixed length, to solved the dynamic shape issue in dataloader.
        text_length -> int, the length of original text string label

    """
    char_indices = str2idx(data["label"], self.dict, max_text_len=self.max_text_len, lower=self.lower)

    if char_indices is None:
        char_indices = []
        # return None
    data["length"] = np.array(len(char_indices), dtype=np.int32)
    # padding with blank index
    char_indices = char_indices + [self.blank_idx] * (self.max_text_len - len(char_indices))
    # TODO: raname to char_indices
    data["text_seq"] = np.array(char_indices, dtype=np.int32)
    #
    data["text_length"] = len(data["label"])
    data["text_padded"] = data["label"] + " " * (self.max_text_len - len(data["label"]))

    return data
mindocr.data.transforms.rec_transforms.RecResizeImg

Bases: object

adopted from paddle resize, convert from hwc to chw, rescale pixel value to -1 to 1

Source code in mindocr\data\transforms\rec_transforms.py
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
class RecResizeImg(object):
    """adopted from paddle
    resize, convert from hwc to chw, rescale pixel value to -1 to 1
    """

    def __init__(self, image_shape, infer_mode=False, character_dict_path=None, padding=True, **kwargs):
        self.image_shape = image_shape
        self.infer_mode = infer_mode
        self.character_dict_path = character_dict_path
        self.padding = padding

    def __call__(self, data):
        img = data["image"]
        if self.infer_mode and self.character_dict_path is not None:
            norm_img, valid_ratio = resize_norm_img_chinese(img, self.image_shape)
        else:
            norm_img, valid_ratio = resize_norm_img(img, self.image_shape, self.padding)
        data["image"] = norm_img
        data["valid_ratio"] = valid_ratio
        # TODO: data['shape_list'] = ?
        return data
mindocr.data.transforms.rec_transforms.RecResizeNormForInfer

Bases: object

Resize image for text recognition

PARAMETER DESCRIPTION
target_height

target height after resize. Commonly, 32 for crnn, 48 for svtr. default is 32.

DEFAULT: 32

target_width

target width. Default is 320. If None, image width is scaled to make aspect ratio unchanged.

DEFAULT: 320

keep_ratio

keep aspect ratio. If True, resize the image with ratio=target_height / input_height (certain image height is required by recognition network). If False, simply resize to targte size (target_height, target_width)

DEFAULT: True

padding

If True, pad the resized image to the targte size with zero RGB values. only used when keep_ratio is True.

DEFAULT: False

Notes
  1. The default choice (keep_ratio, not padding) is suitable for inference for better accuracy.
Source code in mindocr\data\transforms\rec_transforms.py
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
class RecResizeNormForInfer(object):
    """
    Resize image for text recognition

    Args:
        target_height: target height after resize. Commonly, 32 for crnn, 48 for svtr. default is 32.
        target_width: target width. Default is 320. If None, image width is scaled to make aspect ratio unchanged.
        keep_ratio: keep aspect ratio.
            If True, resize the image with ratio=target_height / input_height (certain image height is required by
            recognition network).
            If False, simply resize to targte size (`target_height`, `target_width`)
        padding: If True, pad the resized image to the targte size with zero RGB values.
            only used when `keep_ratio` is True.

    Notes:
        1. The default choice (keep_ratio, not padding) is suitable for inference for better accuracy.
    """

    def __init__(
        self,
        target_height=32,
        target_width=320,
        keep_ratio=True,
        padding=False,
        interpolation=cv2.INTER_LINEAR,
        norm_before_pad=False,
        mean=[127.0, 127.0, 127.0],
        std=[127.0, 127.0, 127.0],
        **kwargs,
    ):
        self.keep_ratio = keep_ratio
        self.padding = padding
        # self.targt_shape = target_shape
        self.tar_h = target_height
        self.tar_w = target_width
        self.interpolation = interpolation
        self.norm_before_pad = norm_before_pad
        self.mean = np.array(mean, dtype="float32")
        self.std = np.array(std, dtype="float32")

    def norm(self, img):
        return (img - self.mean) / self.std

    def __call__(self, data):
        """
        data: image in shape [h, w, c]
        """
        img = data["image"]
        h, w = img.shape[:2]
        # tar_h, tar_w = self.targt_shape
        resize_h = self.tar_h

        max_wh_ratio = self.tar_w / float(self.tar_h)

        if not self.keep_ratio:
            assert self.tar_w is not None, "Must specify target_width if keep_ratio is False"
            resize_w = self.tar_w  # if self.tar_w is not None else resized_h * self.max_wh_ratio
        else:
            src_wh_ratio = w / float(h)
            resize_w = math.ceil(min(src_wh_ratio, max_wh_ratio) * resize_h)
        # print('Rec resize: ', h, w, "->", resize_h, resize_w)
        resized_img = cv2.resize(img, (resize_w, resize_h), interpolation=self.interpolation)

        # TODO: norm before padding

        data["shape_list"] = np.array(
            [h, w, resize_h / h, resize_w / w], dtype=np.float32
        )  # TODO: reformat, currently align to det
        if self.norm_before_pad:
            resized_img = self.norm(resized_img)

        if self.padding and self.keep_ratio:
            padded_img = np.zeros((self.tar_h, self.tar_w, 3), dtype=resized_img.dtype)
            padded_img[:, :resize_w, :] = resized_img
            data["image"] = padded_img
        else:
            data["image"] = resized_img

        if not self.norm_before_pad:
            data["image"] = self.norm(data["image"])

        return data
mindocr.data.transforms.rec_transforms.RecResizeNormForInfer.__call__(data)
Source code in mindocr\data\transforms\rec_transforms.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
def __call__(self, data):
    """
    data: image in shape [h, w, c]
    """
    img = data["image"]
    h, w = img.shape[:2]
    # tar_h, tar_w = self.targt_shape
    resize_h = self.tar_h

    max_wh_ratio = self.tar_w / float(self.tar_h)

    if not self.keep_ratio:
        assert self.tar_w is not None, "Must specify target_width if keep_ratio is False"
        resize_w = self.tar_w  # if self.tar_w is not None else resized_h * self.max_wh_ratio
    else:
        src_wh_ratio = w / float(h)
        resize_w = math.ceil(min(src_wh_ratio, max_wh_ratio) * resize_h)
    # print('Rec resize: ', h, w, "->", resize_h, resize_w)
    resized_img = cv2.resize(img, (resize_w, resize_h), interpolation=self.interpolation)

    # TODO: norm before padding

    data["shape_list"] = np.array(
        [h, w, resize_h / h, resize_w / w], dtype=np.float32
    )  # TODO: reformat, currently align to det
    if self.norm_before_pad:
        resized_img = self.norm(resized_img)

    if self.padding and self.keep_ratio:
        padded_img = np.zeros((self.tar_h, self.tar_w, 3), dtype=resized_img.dtype)
        padded_img[:, :resize_w, :] = resized_img
        data["image"] = padded_img
    else:
        data["image"] = resized_img

    if not self.norm_before_pad:
        data["image"] = self.norm(data["image"])

    return data
mindocr.data.transforms.rec_transforms.Rotate90IfVertical

Rotate the image by 90 degree when the height/width ratio is larger than the given threshold. Note: It needs to be called before image resize.

Source code in mindocr\data\transforms\rec_transforms.py
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
class Rotate90IfVertical:
    """Rotate the image by 90 degree when the height/width ratio is larger than the given threshold.
    Note: It needs to be called before image resize."""

    def __init__(self, threshold: float = 1.5, direction: str = "counterclockwise", **kwargs):
        self.threshold = threshold

        if direction == "counterclockwise":
            self.flag = cv2.ROTATE_90_COUNTERCLOCKWISE
        elif direction == "clockwise":
            self.flag = cv2.ROTATE_90_CLOCKWISE
        else:
            raise ValueError("Unsupported direction")

    def __call__(self, data):
        img = data["image"]

        h, w, _ = img.shape
        if h / w > self.threshold:
            img = cv2.rotate(img, self.flag)

        data["image"] = img
        return data
mindocr.data.transforms.rec_transforms.resize_norm_img(img, image_shape, padding=True, interpolation=cv2.INTER_LINEAR)

resize image

PARAMETER DESCRIPTION
img

shape (H, W, C)

image_shape

image shape after resize, in (C, H, W)

padding

if Ture, resize while preserving the H/W ratio, then pad the blank.

DEFAULT: True

Source code in mindocr\data\transforms\rec_transforms.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def resize_norm_img(img, image_shape, padding=True, interpolation=cv2.INTER_LINEAR):
    """
    resize image
    Args:
        img: shape (H, W, C)
        image_shape: image shape after resize, in (C, H, W)
        padding: if Ture, resize while preserving the H/W ratio, then pad the blank.

    """
    imgH, imgW = image_shape
    h = img.shape[0]
    w = img.shape[1]
    c = img.shape[2]
    if not padding:
        resized_image = cv2.resize(img, (imgW, imgH), interpolation=interpolation)
        resized_w = imgW
    else:
        ratio = w / float(h)
        if math.ceil(imgH * ratio) > imgW:
            resized_w = imgW
        else:
            resized_w = int(math.ceil(imgH * ratio))
        resized_image = cv2.resize(img, (resized_w, imgH))

    """
    resized_image = resized_image.astype('float32')
    if image_shape[0] == 1:
        resized_image = resized_image / 255
        resized_image = resized_image[np.newaxis, :]
    else:
        resized_image = resized_image.transpose((2, 0, 1)) / 255
    resized_image -= 0.5
    resized_image /= 0.5
    """
    padding_im = np.zeros((imgH, imgW, c), dtype=np.uint8)
    padding_im[:, 0:resized_w, :] = resized_image
    valid_ratio = min(1.0, float(resized_w / imgW))
    return padding_im, valid_ratio
mindocr.data.transforms.rec_transforms.resize_norm_img_chinese(img, image_shape)

adopted from paddle

Source code in mindocr\data\transforms\rec_transforms.py
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
def resize_norm_img_chinese(img, image_shape):
    """adopted from paddle"""
    imgH, imgW = image_shape
    # todo: change to 0 and modified image shape
    max_wh_ratio = imgW * 1.0 / imgH
    h, w = img.shape[0], img.shape[1]
    c = img.shape[2]
    ratio = w * 1.0 / h

    imgW = int(imgH * max_wh_ratio)
    if math.ceil(imgH * ratio) > imgW:
        resized_w = imgW
    else:
        resized_w = int(math.ceil(imgH * ratio))
    resized_image = cv2.resize(img, (resized_w, imgH))

    """
    resized_image = resized_image.astype('float32')
    if image_shape[0] == 1:
        resized_image = resized_image / 255
        resized_image = resized_image[np.newaxis, :]
    else:
        resized_image = resized_image.transpose((2, 0, 1)) / 255
    resized_image -= 0.5
    resized_image /= 0.5
    """
    # padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
    padding_im = np.zeros((imgH, imgW, c), dtype=np.uint8)
    # padding_im[:, :, 0:resized_w] = resized_image
    padding_im[:, 0:resized_w, :] = resized_image
    valid_ratio = min(1.0, float(resized_w / imgW))
    return padding_im, valid_ratio
mindocr.data.transforms.rec_transforms.str2idx(text, label_dict, max_text_len=23, lower=False)

Encode text (string) to a squence of char indices

PARAMETER DESCRIPTION
text

text string

TYPE: str

RETURNS DESCRIPTION
char_indices

char index seq

TYPE: List[int]

Source code in mindocr\data\transforms\rec_transforms.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
def str2idx(text: str, label_dict: Dict[str, int], max_text_len: int = 23, lower: bool = False) -> List[int]:
    """
    Encode text (string) to a squence of char indices
    Args:
        text (str): text string
    Returns:
        char_indices (List[int]): char index seq
    """
    if len(text) == 0 or len(text) > max_text_len:
        return None
    if lower:
        text = text.lower()

    char_indices = []
    # TODO: for char not in the dictionary, skipping may lead to None data. Use a char replacement? refer to mmocr
    for char in text:
        if char not in label_dict:
            # print('WARNING: {} is not in dict'.format(char))
            continue
        char_indices.append(label_dict[char])
    if len(char_indices) == 0:
        print("WARNING: {} doesnot contain any valid char in the dict".format(text))
        return None

    return char_indices
mindocr.data.transforms.svtr_transform
mindocr.data.transforms.svtr_transform.CVRescale

Bases: object

Source code in mindocr\data\transforms\svtr_transform.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
class CVRescale(object):
    def __init__(self, factor=4, base_size=(128, 512)):
        """Define image scales using gaussian pyramid and rescale image to target scale.

        Args:
            factor: the decayed factor from base size, factor=4 keeps target scale by default.
            base_size: base size the build the bottom layer of pyramid
        """
        if isinstance(factor, numbers.Number):
            self.factor = round(sample_uniform(0, factor))
        elif isinstance(factor, (tuple, list)) and len(factor) == 2:
            self.factor = round(sample_uniform(factor[0], factor[1]))
        else:
            raise Exception("factor must be number or list with length 2")
        # assert factor is valid
        self.base_h, self.base_w = base_size[:2]

    def __call__(self, img):
        if self.factor == 0:
            return img
        src_h, src_w = img.shape[:2]
        cur_w, cur_h = self.base_w, self.base_h
        scale_img = cv2.resize(img, (cur_w, cur_h), interpolation=get_interpolation())
        for _ in range(self.factor):
            scale_img = cv2.pyrDown(scale_img)
        scale_img = cv2.resize(scale_img, (src_w, src_h), interpolation=get_interpolation())
        return scale_img
mindocr.data.transforms.svtr_transform.CVRescale.__init__(factor=4, base_size=(128, 512))

Define image scales using gaussian pyramid and rescale image to target scale.

PARAMETER DESCRIPTION
factor

the decayed factor from base size, factor=4 keeps target scale by default.

DEFAULT: 4

base_size

base size the build the bottom layer of pyramid

DEFAULT: (128, 512)

Source code in mindocr\data\transforms\svtr_transform.py
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
def __init__(self, factor=4, base_size=(128, 512)):
    """Define image scales using gaussian pyramid and rescale image to target scale.

    Args:
        factor: the decayed factor from base size, factor=4 keeps target scale by default.
        base_size: base size the build the bottom layer of pyramid
    """
    if isinstance(factor, numbers.Number):
        self.factor = round(sample_uniform(0, factor))
    elif isinstance(factor, (tuple, list)) and len(factor) == 2:
        self.factor = round(sample_uniform(factor[0], factor[1]))
    else:
        raise Exception("factor must be number or list with length 2")
    # assert factor is valid
    self.base_h, self.base_w = base_size[:2]
mindocr.data.transforms.transforms_factory

Create and run transformations from a config or predefined transformation pipeline

mindocr.data.transforms.transforms_factory.create_transforms(transform_pipeline, global_config=None)

Create a squence of callable transforms.

PARAMETER DESCRIPTION
transform_pipeline

list of callable instances or dicts where each key is a transformation class name, and its value are the args. e.g. [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}] [DecodeImage(img_mode='BGR')]

TYPE: List

RETURNS DESCRIPTION

list of data transformation functions

Source code in mindocr\data\transforms\transforms_factory.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def create_transforms(transform_pipeline: List, global_config: Dict = None):
    """
    Create a squence of callable transforms.

    Args:
        transform_pipeline (List): list of callable instances or dicts where each key is a transformation class name,
            and its value are the args.
            e.g. [{'DecodeImage': {'img_mode': 'BGR', 'channel_first': False}}]
                 [DecodeImage(img_mode='BGR')]

    Returns:
        list of data transformation functions
    """
    assert isinstance(
        transform_pipeline, list
    ), f"transform_pipeline config should be a list, but {type(transform_pipeline)} detected"

    transforms = []
    for transform_config in transform_pipeline:
        if isinstance(transform_config, dict):
            assert len(transform_config) == 1, "yaml format error in transforms"
            trans_name = list(transform_config.keys())[0]
            param = {} if transform_config[trans_name] is None else transform_config[trans_name]
            if global_config is not None:
                param.update(global_config)
            # TODO: assert undefined transform class

            transform = eval(trans_name)(**param)
            transforms.append(transform)
        elif callable(transform_config):
            transforms.append(transform_config)
        else:
            raise TypeError("transform_config must be a dict or a callable instance")
        # print(global_config)
    return transforms
mindocr.data.transforms.transforms_factory.transforms_dbnet_icdar15(phase='train')

Get pre-defined transform config for dbnet on icdar15 dataset.

PARAMETER DESCRIPTION
phase

train, eval, infer

DEFAULT: 'train'

RETURNS DESCRIPTION

list of dict for data transformation pipeline, which can be convert to functions by 'create_transforms'

Source code in mindocr\data\transforms\transforms_factory.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def transforms_dbnet_icdar15(phase="train"):
    """
    Get pre-defined transform config for dbnet on icdar15 dataset.
    Args:
        phase: train, eval, infer
    Returns:
        list of dict for data transformation pipeline, which can be convert to functions by 'create_transforms'
    """
    if phase == "train":
        pipeline = [
            {"DecodeImage": {"img_mode": "RGB", "to_float32": False}},
            {"DetLabelEncode": None},
            {"RandomScale": {"scale_range": [1.022, 3.0]}},
            {"IaaAugment": {"Affine": {"rotate": [-10, 10]}, "Fliplr": {"p": 0.5}}},
            {"RandomCropWithBBox": {"max_tries": 100, "min_crop_ratio": 0.1, "crop_size": (640, 640)}},
            {"ShrinkBinaryMap": {"min_text_size": 8, "shrink_ratio": 0.4}},
            {
                "BorderMap": {
                    "shrink_ratio": 0.4,
                    "thresh_min": 0.3,
                    "thresh_max": 0.7,
                }
            },
            {"RandomColorAdjust": {"brightness": 32.0 / 255, "saturation": 0.5}},
            {
                "NormalizeImage": {
                    "bgr_to_rgb": False,
                    "is_hwc": True,
                    "mean": [123.675, 116.28, 103.53],
                    "std": [58.395, 57.12, 57.375],
                }
            },
            {"ToCHWImage": None},
        ]

    elif phase == "eval":
        pipeline = [
            {"DecodeImage": {"img_mode": "RGB", "to_float32": False}},
            {"DetLabelEncode": None},
            {"GridResize": {"factor": 32}},
            {"ScalePadImage": {"target_size": [736, 1280]}},
            {
                "NormalizeImage": {
                    "bgr_to_rgb": False,
                    "is_hwc": True,
                    "mean": [123.675, 116.28, 103.53],
                    "std": [58.395, 57.12, 57.375],
                }
            },
            {"ToCHWImage": None},
        ]
    else:
        pipeline = [
            {"DecodeImage": {"img_mode": "RGB", "to_float32": False}},
            {"GridResize": {"factor": 32}},
            {"ScalePadImage": {"target_size": [736, 1280]}},
            {
                "NormalizeImage": {
                    "bgr_to_rgb": False,
                    "is_hwc": True,
                    "mean": [123.675, 116.28, 103.53],
                    "std": [58.395, 57.12, 57.375],
                }
            },
            {"ToCHWImage": None},
        ]
    return pipeline

mindocr.losses

mindocr.losses.builder
mindocr.losses.builder.build_loss(name, **kwargs)

Create the loss function.

PARAMETER DESCRIPTION
name

loss function name, exactly the same as one of the supported loss class names

TYPE: str

Return

nn.LossBase

Example
Create a CTC Loss module

from mindocr.losses import build_loss loss_func_name = "CTCLoss" loss_func_config = {"pred_seq_len": 25, "max_label_len": 24, "batch_size": 32} loss_fn = build_loss(loss_func_name, **loss_func_config) loss_fn CTCLoss<>

Source code in mindocr\losses\builder.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def build_loss(name, **kwargs):
    """
    Create the loss function.

    Args:
        name (str): loss function name, exactly the same as one of the supported loss class names

    Return:
        nn.LossBase

    Example:
        >>> # Create a CTC Loss module
        >>> from mindocr.losses import build_loss
        >>> loss_func_name = "CTCLoss"
        >>> loss_func_config = {"pred_seq_len": 25, "max_label_len": 24, "batch_size": 32}
        >>> loss_fn = build_loss(loss_func_name, **loss_func_config)
        >>> loss_fn
        CTCLoss<>
    """
    assert name in supported_losses, f"Invalid loss name {name}, support losses are {supported_losses}"

    loss_fn = eval(name)(**kwargs)

    # print('=> Loss func input args: \n\t', inspect.signature(loss_fn.construct))

    return loss_fn
mindocr.losses.cls_loss
mindocr.losses.cls_loss.CrossEntropySmooth

Bases: nn.LossBase

Cross entropy loss with label smoothing. Apply softmax activation function to input logits, and uses the given logits to compute cross entropy between the logits and the label.

PARAMETER DESCRIPTION
smoothing

Label smoothing factor, a regularization tool used to prevent the model from overfitting when calculating Loss. The value range is [0.0, 1.0]. Default: 0.0.

DEFAULT: 0.0

aux_factor

Auxiliary loss factor. Set aux_factor > 0.0 if the model has auxiliary logit outputs (i.e., deep supervision), like inception_v3. Default: 0.0.

DEFAULT: 0.0

reduction

Apply specific reduction method to the output: 'mean' or 'sum'. Default: 'mean'.

DEFAULT: 'mean'

weight

Class weight. Shape [C]. A rescaling weight applied to the loss of each batch element. Data type must be float16 or float32.

TYPE: Tensor DEFAULT: None

Inputs

logits (Tensor or Tuple of Tensor): Input logits. Shape [N, C], where N is # samples, C is # classes. Tuple composed of multiple logits are supported in order (main_logits, aux_logits) for auxiliary loss used in networks like inception_v3. labels (Tensor): Ground truth label. Shape: [N] or [N, C]. (1) Shape (N), sparse labels representing the class indices. Must be int type. (2) Shape [N, C], dense labels representing the ground truth class probability values, or the one-hot labels. Must be float type.

Source code in mindocr\losses\cls_loss.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class CrossEntropySmooth(nn.LossBase):
    """
    Cross entropy loss with label smoothing.
    Apply softmax activation function to input `logits`, and uses the given logits to compute cross entropy
    between the logits and the label.

    Args:
        smoothing: Label smoothing factor, a regularization tool used to prevent the model
            from overfitting when calculating Loss. The value range is [0.0, 1.0]. Default: 0.0.
        aux_factor: Auxiliary loss factor. Set aux_factor > 0.0 if the model has auxiliary logit outputs
            (i.e., deep supervision), like inception_v3.  Default: 0.0.
        reduction: Apply specific reduction method to the output: 'mean' or 'sum'. Default: 'mean'.
        weight (Tensor): Class weight. Shape [C]. A rescaling weight applied to the loss of each batch element.
            Data type must be float16 or float32.

    Inputs:
        logits (Tensor or Tuple of Tensor): Input logits. Shape [N, C], where N is # samples, C is # classes.
            Tuple composed of multiple logits are supported in order (main_logits, aux_logits)
            for auxiliary loss used in networks like inception_v3.
        labels (Tensor): Ground truth label. Shape: [N] or [N, C].
            (1) Shape (N), sparse labels representing the class indices. Must be int type.
            (2) Shape [N, C], dense labels representing the ground truth class probability values,
            or the one-hot labels. Must be float type.
    """

    def __init__(self, smoothing=0.0, aux_factor=0.0, reduction="mean", weight=None):
        super().__init__()
        self.smoothing = smoothing
        self.aux_factor = aux_factor
        self.reduction = reduction
        self.weight = weight

    def construct(self, logits, labels):
        loss_aux = 0

        if isinstance(logits, tuple):
            main_logits = logits[0]
            for aux in logits[1:]:
                if self.aux_factor > 0:
                    loss_aux += F.cross_entropy(
                        aux, labels, weight=self.weight, reduction=self.reduction, label_smoothing=self.smoothing
                    )
        else:
            main_logits = logits

        loss_logits = F.cross_entropy(
            main_logits, labels, weight=self.weight, reduction=self.reduction, label_smoothing=self.smoothing
        )
        loss = loss_logits + self.aux_factor * loss_aux
        return loss
mindocr.losses.det_loss
mindocr.losses.det_loss.BalancedBCELoss

Bases: nn.LossBase

Balanced cross entropy loss.

Source code in mindocr\losses\det_loss.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
class BalancedBCELoss(nn.LossBase):
    """Balanced cross entropy loss."""

    def __init__(self, negative_ratio=3, eps=1e-6):
        super().__init__()
        self._negative_ratio = negative_ratio
        self._eps = eps
        self._bce_loss = ops.BinaryCrossEntropy(reduction="none")

    def construct(self, pred, gt, mask):
        """
        Args:
            pred: shape :math:`(N, 1, H, W)`, the prediction of network
            gt: shape :math:`(N, 1, H, W)`, the target
            mask: shape :math:`(N, H, W)`, the mask indicates positive regions
        """
        pred = pred.squeeze(axis=1)
        gt = gt.squeeze(axis=1)

        positive = gt * mask
        negative = (1 - gt) * mask

        pos_count = positive.sum(axis=(1, 2), keepdims=True).astype(ms.int32)
        neg_count = negative.sum(axis=(1, 2), keepdims=True).astype(ms.int32)
        neg_count = ops.minimum(neg_count, pos_count * self._negative_ratio).squeeze(axis=(1, 2))

        loss = self._bce_loss(pred, gt, None)

        pos_loss = loss * positive
        neg_loss = (loss * negative).view(loss.shape[0], -1)

        neg_vals, _ = ops.sort(neg_loss)
        neg_index = ops.stack((mnp.arange(loss.shape[0]), neg_vals.shape[1] - neg_count), axis=1)
        min_neg_score = ops.expand_dims(ops.gather_nd(neg_vals, neg_index), axis=1)

        neg_loss_mask = (neg_loss >= min_neg_score).astype(ms.float32)  # filter values less than top k
        neg_loss_mask = ops.stop_gradient(neg_loss_mask)

        neg_loss = neg_loss_mask * neg_loss

        return (pos_loss.sum() + neg_loss.sum()) / (
            pos_count.astype(ms.float32).sum() + neg_count.astype(ms.float32).sum() + self._eps
        )
mindocr.losses.det_loss.BalancedBCELoss.construct(pred, gt, mask)
PARAMETER DESCRIPTION
pred

shape :math:(N, 1, H, W), the prediction of network

gt

shape :math:(N, 1, H, W), the target

mask

shape :math:(N, H, W), the mask indicates positive regions

Source code in mindocr\losses\det_loss.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def construct(self, pred, gt, mask):
    """
    Args:
        pred: shape :math:`(N, 1, H, W)`, the prediction of network
        gt: shape :math:`(N, 1, H, W)`, the target
        mask: shape :math:`(N, H, W)`, the mask indicates positive regions
    """
    pred = pred.squeeze(axis=1)
    gt = gt.squeeze(axis=1)

    positive = gt * mask
    negative = (1 - gt) * mask

    pos_count = positive.sum(axis=(1, 2), keepdims=True).astype(ms.int32)
    neg_count = negative.sum(axis=(1, 2), keepdims=True).astype(ms.int32)
    neg_count = ops.minimum(neg_count, pos_count * self._negative_ratio).squeeze(axis=(1, 2))

    loss = self._bce_loss(pred, gt, None)

    pos_loss = loss * positive
    neg_loss = (loss * negative).view(loss.shape[0], -1)

    neg_vals, _ = ops.sort(neg_loss)
    neg_index = ops.stack((mnp.arange(loss.shape[0]), neg_vals.shape[1] - neg_count), axis=1)
    min_neg_score = ops.expand_dims(ops.gather_nd(neg_vals, neg_index), axis=1)

    neg_loss_mask = (neg_loss >= min_neg_score).astype(ms.float32)  # filter values less than top k
    neg_loss_mask = ops.stop_gradient(neg_loss_mask)

    neg_loss = neg_loss_mask * neg_loss

    return (pos_loss.sum() + neg_loss.sum()) / (
        pos_count.astype(ms.float32).sum() + neg_count.astype(ms.float32).sum() + self._eps
    )
mindocr.losses.det_loss.DiceLoss

Bases: nn.LossBase

Source code in mindocr\losses\det_loss.py
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
class DiceLoss(nn.LossBase):
    def __init__(self, eps=1e-6):
        super().__init__()
        self._eps = eps

    def construct(self, pred, gt, mask):
        """
        pred: one or two heatmaps of shape (N, 1, H, W),
              the losses of two heatmaps are added together.
        gt: (N, 1, H, W)
        mask: (N, H, W)
        """
        pred = pred.squeeze(axis=1) * mask
        gt = gt.squeeze(axis=1) * mask

        intersection = (pred * gt).sum()
        union = pred.sum() + gt.sum() + self._eps
        return 1 - 2.0 * intersection / union
mindocr.losses.det_loss.DiceLoss.construct(pred, gt, mask)
one or two heatmaps of shape (N, 1, H, W),

the losses of two heatmaps are added together.

Source code in mindocr\losses\det_loss.py
84
85
86
87
88
89
90
91
92
93
94
95
96
def construct(self, pred, gt, mask):
    """
    pred: one or two heatmaps of shape (N, 1, H, W),
          the losses of two heatmaps are added together.
    gt: (N, 1, H, W)
    mask: (N, H, W)
    """
    pred = pred.squeeze(axis=1) * mask
    gt = gt.squeeze(axis=1) * mask

    intersection = (pred * gt).sum()
    union = pred.sum() + gt.sum() + self._eps
    return 1 - 2.0 * intersection / union
mindocr.losses.det_loss.L1BalancedCELoss

Bases: nn.LossBase

Balanced CrossEntropy Loss on binary, MaskL1Loss on thresh, DiceLoss on thresh_binary. Note: The meaning of inputs can be figured out in SegDetectorLossBuilder.

Source code in mindocr\losses\det_loss.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
class L1BalancedCELoss(nn.LossBase):
    """
    Balanced CrossEntropy Loss on `binary`,
    MaskL1Loss on `thresh`,
    DiceLoss on `thresh_binary`.
    Note: The meaning of inputs can be figured out in `SegDetectorLossBuilder`.
    """

    def __init__(self, eps=1e-6, bce_scale=5, l1_scale=10, bce_replace="bceloss"):
        super().__init__()

        self.dice_loss = DiceLoss(eps=eps)
        self.l1_loss = MaskL1Loss()

        if bce_replace == "bceloss":
            self.bce_loss = BalancedBCELoss()
        elif bce_replace == "diceloss":
            self.bce_loss = DiceLoss()
        else:
            raise ValueError(f"bce_replace should be in ['bceloss', 'diceloss'], but get {bce_replace}")

        self.l1_scale = l1_scale
        self.bce_scale = bce_scale

    def construct(
        self, pred: Union[Tensor, Tuple[Tensor]], gt: Tensor, gt_mask: Tensor, thresh_map: Tensor, thresh_mask: Tensor
    ):
        """
        Compute dbnet loss
        Args:
            pred (Tuple[Tensor]): network prediction consists of
                binary: The text segmentation prediction.
                thresh: The threshold prediction (optional)
                thresh_binary: Value produced by `step_function(binary - thresh)`. (optional)
            gt (Tensor): Text regions bitmap gt.
            mask (Tensor): Ignore mask, pexels where value is 1 indicates no contribution to loss.
            thresh_mask (Tensor): Mask indicates regions cared by thresh supervision.
            thresh_map (Tensor): Threshold gt.
        Return:
            loss value (Tensor)
        """
        if isinstance(pred, ms.Tensor):
            loss = self.bce_loss(pred, gt, gt_mask)
        else:
            binary, thresh, thresh_binary = pred
            bce_loss_output = self.bce_loss(binary, gt, gt_mask)
            l1_loss = self.l1_loss(thresh, thresh_map, thresh_mask)
            dice_loss = self.dice_loss(thresh_binary, gt, gt_mask)
            loss = dice_loss + self.l1_scale * l1_loss + self.bce_scale * bce_loss_output

        """
        if isinstance(pred, tuple):
            binary, thresh, thresh_binary = pred
        else:
            binary = pred

        bce_loss_output = self.bce_loss(binary, gt, gt_mask)

        if isinstance(pred, tuple):
            l1_loss = self.l1_loss(thresh, thresh_map, thresh_mask)
            dice_loss = self.dice_loss(thresh_binary, gt, gt_mask)
            loss = dice_loss + self.l1_scale * l1_loss + self.bce_scale * bce_loss_output
        else:
            loss = bce_loss_output
        """
        return loss
mindocr.losses.det_loss.L1BalancedCELoss.construct(pred, gt, gt_mask, thresh_map, thresh_mask)

Compute dbnet loss

PARAMETER DESCRIPTION
pred

network prediction consists of binary: The text segmentation prediction. thresh: The threshold prediction (optional) thresh_binary: Value produced by step_function(binary - thresh). (optional)

TYPE: Tuple[Tensor]

gt

Text regions bitmap gt.

TYPE: Tensor

mask

Ignore mask, pexels where value is 1 indicates no contribution to loss.

TYPE: Tensor

thresh_mask

Mask indicates regions cared by thresh supervision.

TYPE: Tensor

thresh_map

Threshold gt.

TYPE: Tensor

Return

loss value (Tensor)

Source code in mindocr\losses\det_loss.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def construct(
    self, pred: Union[Tensor, Tuple[Tensor]], gt: Tensor, gt_mask: Tensor, thresh_map: Tensor, thresh_mask: Tensor
):
    """
    Compute dbnet loss
    Args:
        pred (Tuple[Tensor]): network prediction consists of
            binary: The text segmentation prediction.
            thresh: The threshold prediction (optional)
            thresh_binary: Value produced by `step_function(binary - thresh)`. (optional)
        gt (Tensor): Text regions bitmap gt.
        mask (Tensor): Ignore mask, pexels where value is 1 indicates no contribution to loss.
        thresh_mask (Tensor): Mask indicates regions cared by thresh supervision.
        thresh_map (Tensor): Threshold gt.
    Return:
        loss value (Tensor)
    """
    if isinstance(pred, ms.Tensor):
        loss = self.bce_loss(pred, gt, gt_mask)
    else:
        binary, thresh, thresh_binary = pred
        bce_loss_output = self.bce_loss(binary, gt, gt_mask)
        l1_loss = self.l1_loss(thresh, thresh_map, thresh_mask)
        dice_loss = self.dice_loss(thresh_binary, gt, gt_mask)
        loss = dice_loss + self.l1_scale * l1_loss + self.bce_scale * bce_loss_output

    """
    if isinstance(pred, tuple):
        binary, thresh, thresh_binary = pred
    else:
        binary = pred

    bce_loss_output = self.bce_loss(binary, gt, gt_mask)

    if isinstance(pred, tuple):
        l1_loss = self.l1_loss(thresh, thresh_map, thresh_mask)
        dice_loss = self.dice_loss(thresh_binary, gt, gt_mask)
        loss = dice_loss + self.l1_scale * l1_loss + self.bce_scale * bce_loss_output
    else:
        loss = bce_loss_output
    """
    return loss
mindocr.losses.det_loss.MaskL1Loss

Bases: nn.LossBase

Source code in mindocr\losses\det_loss.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
class MaskL1Loss(nn.LossBase):
    def __init__(self, eps=1e-6):
        super().__init__()
        self._eps = eps

    def construct(self, pred, gt, mask):
        """
        Args:
            pred: shape :math:`(N, 1, H, W)`, the prediction of network
            gt: shape :math:`(N, H, W)`, the target
            mask: shape :math:`(N, H, W)`, the mask indicates positive regions
        """
        pred = pred.squeeze(axis=1)
        return ((pred - gt).abs() * mask).sum() / (mask.sum() + self._eps)
mindocr.losses.det_loss.MaskL1Loss.construct(pred, gt, mask)
PARAMETER DESCRIPTION
pred

shape :math:(N, 1, H, W), the prediction of network

gt

shape :math:(N, H, W), the target

mask

shape :math:(N, H, W), the mask indicates positive regions

Source code in mindocr\losses\det_loss.py
104
105
106
107
108
109
110
111
112
def construct(self, pred, gt, mask):
    """
    Args:
        pred: shape :math:`(N, 1, H, W)`, the prediction of network
        gt: shape :math:`(N, H, W)`, the target
        mask: shape :math:`(N, H, W)`, the mask indicates positive regions
    """
    pred = pred.squeeze(axis=1)
    return ((pred - gt).abs() * mask).sum() / (mask.sum() + self._eps)
mindocr.losses.det_loss.PSEDiceLoss

Bases: nn.Cell

Source code in mindocr\losses\det_loss.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
class PSEDiceLoss(nn.Cell):
    def __init__(self, alpha=0.7, ohem_ratio=3):
        super().__init__()
        self.threshold0 = Tensor(0.5, mstype.float32)
        self.zero_float32 = Tensor(0.0, mstype.float32)
        self.alpha = alpha
        self.ohem_ratio = ohem_ratio
        self.negative_one_int32 = Tensor(-1, mstype.int32)
        self.concat = ops.Concat()
        self.less_equal = ops.LessEqual()
        self.greater = ops.Greater()
        self.reduce_sum = ops.ReduceSum()
        self.reduce_sum_keep_dims = ops.ReduceSum(keep_dims=True)
        self.reduce_mean = ops.ReduceMean()
        self.reduce_min = ops.ReduceMin()
        self.cast = ops.Cast()
        self.minimum = ops.Minimum()
        self.expand_dims = ops.ExpandDims()
        self.select = ops.Select()
        self.fill = ops.Fill()
        self.topk = ops.TopK(sorted=True)
        self.shape = ops.Shape()
        self.sigmoid = ops.Sigmoid()
        self.reshape = ops.Reshape()
        self.slice = ops.Slice()
        self.logical_and = ops.LogicalAnd()
        self.logical_or = ops.LogicalOr()
        self.equal = ops.Equal()
        self.zeros_like = ops.ZerosLike()
        self.add = ops.Add()
        self.gather = ops.Gather()
        self.upsample = nn.ResizeBilinear()

    def ohem_batch(self, scores, gt_texts, training_masks):
        """

        :param scores: [N * H * W]
        :param gt_texts:  [N * H * W]
        :param training_masks: [N * H * W]
        :return: [N * H * W]
        """
        batch_size = scores.shape[0]
        h, w = scores.shape[1:]
        selected_masks = ()
        for i in range(batch_size):
            score = self.slice(scores, (i, 0, 0), (1, h, w))
            score = self.reshape(score, (h, w))

            gt_text = self.slice(gt_texts, (i, 0, 0), (1, h, w))
            gt_text = self.reshape(gt_text, (h, w))

            training_mask = self.slice(training_masks, (i, 0, 0), (1, h, w))
            training_mask = self.reshape(training_mask, (h, w))

            selected_mask = self.ohem_single(score, gt_text, training_mask)
            selected_masks = selected_masks + (selected_mask,)

        selected_masks = self.concat(selected_masks)
        return selected_masks

    def ohem_single(self, score, gt_text, training_mask):
        h, w = score.shape[0:2]
        k = int(h * w)
        pos_num = self.logical_and(self.greater(gt_text, self.threshold0), self.greater(training_mask, self.threshold0))
        pos_num = self.reduce_sum(self.cast(pos_num, mstype.float32))

        neg_num = self.less_equal(gt_text, self.threshold0)
        neg_num = self.reduce_sum(self.cast(neg_num, mstype.float32))
        neg_num = self.minimum(self.ohem_ratio * pos_num, neg_num)
        neg_num = self.cast(neg_num, mstype.int32)

        neg_num = neg_num + k - 1
        neg_mask = self.less_equal(gt_text, self.threshold0)
        ignore_score = self.fill(mstype.float32, (h, w), -1e3)
        neg_score = self.select(neg_mask, score, ignore_score)
        neg_score = self.reshape(neg_score, (h * w,))

        topk_values, _ = self.topk(neg_score, k)
        threshold = self.gather(topk_values, neg_num, 0)

        selected_mask = self.logical_and(
            self.logical_or(self.greater(score, threshold), self.greater(gt_text, self.threshold0)),
            self.greater(training_mask, self.threshold0),
        )

        selected_mask = self.cast(selected_mask, mstype.float32)
        selected_mask = self.expand_dims(selected_mask, 0)

        return selected_mask

    def dice_loss(self, input_params, target, mask):
        """

        :param input: [N, H, W]
        :param target: [N, H, W]
        :param mask: [N, H, W]
        :return:
        """
        batch_size = input_params.shape[0]
        input_sigmoid = self.sigmoid(input_params)

        input_reshape = self.reshape(input_sigmoid, (batch_size, -1))
        target = self.reshape(target, (batch_size, -1))
        mask = self.reshape(mask, (batch_size, -1))

        input_mask = input_reshape * mask
        target = target * mask

        a = self.reduce_sum(input_mask * target, 1)
        b = self.reduce_sum(input_mask * input_mask, 1) + 0.001
        c = self.reduce_sum(target * target, 1) + 0.001
        d = (2 * a) / (b + c)
        dice_loss = self.reduce_mean(d)
        return 1 - dice_loss

    def avg_losses(self, loss_list):
        loss_kernel = loss_list[0]
        for i in range(1, len(loss_list)):
            loss_kernel += loss_list[i]
        loss_kernel = loss_kernel / len(loss_list)
        return loss_kernel

    def construct(self, model_predict, gt_texts, gt_kernels, training_masks):
        """

        :param model_predict: [N * 7 * H * W]
        :param gt_texts: [N * H * W]
        :param gt_kernels:[N * 6 * H * W]
        :param training_masks:[N * H * W]
        :return:
        """
        batch_size = model_predict.shape[0]
        model_predict = self.upsample(model_predict, scale_factor=4)
        h, w = model_predict.shape[2:]
        texts = self.slice(model_predict, (0, 0, 0, 0), (batch_size, 1, h, w))
        texts = self.reshape(texts, (batch_size, h, w))
        selected_masks_text = self.ohem_batch(texts, gt_texts, training_masks)
        loss_text = self.dice_loss(texts, gt_texts, selected_masks_text)
        kernels = []
        loss_kernels = []
        for i in range(1, 7):
            kernel = self.slice(model_predict, (0, i, 0, 0), (batch_size, 1, h, w))
            kernel = self.reshape(kernel, (batch_size, h, w))
            kernels.append(kernel)

        mask0 = self.sigmoid(texts)
        selected_masks_kernels = self.logical_and(
            self.greater(mask0, self.threshold0), self.greater(training_masks, self.threshold0)
        )
        selected_masks_kernels = self.cast(selected_masks_kernels, mstype.float32)

        for i in range(6):
            gt_kernel = self.slice(gt_kernels, (0, i, 0, 0), (batch_size, 1, h, w))
            gt_kernel = self.reshape(gt_kernel, (batch_size, h, w))
            loss_kernel_i = self.dice_loss(kernels[i], gt_kernel, selected_masks_kernels)
            loss_kernels.append(loss_kernel_i)
        loss_kernel = self.avg_losses(loss_kernels)

        loss = self.alpha * loss_text + (1 - self.alpha) * loss_kernel
        return loss
mindocr.losses.det_loss.PSEDiceLoss.construct(model_predict, gt_texts, gt_kernels, training_masks)

:param model_predict: [N * 7 * H * W] :param gt_texts: [N * H * W] :param gt_kernels:[N * 6 * H * W] :param training_masks:[N * H * W] :return:

Source code in mindocr\losses\det_loss.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
def construct(self, model_predict, gt_texts, gt_kernels, training_masks):
    """

    :param model_predict: [N * 7 * H * W]
    :param gt_texts: [N * H * W]
    :param gt_kernels:[N * 6 * H * W]
    :param training_masks:[N * H * W]
    :return:
    """
    batch_size = model_predict.shape[0]
    model_predict = self.upsample(model_predict, scale_factor=4)
    h, w = model_predict.shape[2:]
    texts = self.slice(model_predict, (0, 0, 0, 0), (batch_size, 1, h, w))
    texts = self.reshape(texts, (batch_size, h, w))
    selected_masks_text = self.ohem_batch(texts, gt_texts, training_masks)
    loss_text = self.dice_loss(texts, gt_texts, selected_masks_text)
    kernels = []
    loss_kernels = []
    for i in range(1, 7):
        kernel = self.slice(model_predict, (0, i, 0, 0), (batch_size, 1, h, w))
        kernel = self.reshape(kernel, (batch_size, h, w))
        kernels.append(kernel)

    mask0 = self.sigmoid(texts)
    selected_masks_kernels = self.logical_and(
        self.greater(mask0, self.threshold0), self.greater(training_masks, self.threshold0)
    )
    selected_masks_kernels = self.cast(selected_masks_kernels, mstype.float32)

    for i in range(6):
        gt_kernel = self.slice(gt_kernels, (0, i, 0, 0), (batch_size, 1, h, w))
        gt_kernel = self.reshape(gt_kernel, (batch_size, h, w))
        loss_kernel_i = self.dice_loss(kernels[i], gt_kernel, selected_masks_kernels)
        loss_kernels.append(loss_kernel_i)
    loss_kernel = self.avg_losses(loss_kernels)

    loss = self.alpha * loss_text + (1 - self.alpha) * loss_kernel
    return loss
mindocr.losses.det_loss.PSEDiceLoss.dice_loss(input_params, target, mask)

:param input: [N, H, W] :param target: [N, H, W] :param mask: [N, H, W] :return:

Source code in mindocr\losses\det_loss.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
def dice_loss(self, input_params, target, mask):
    """

    :param input: [N, H, W]
    :param target: [N, H, W]
    :param mask: [N, H, W]
    :return:
    """
    batch_size = input_params.shape[0]
    input_sigmoid = self.sigmoid(input_params)

    input_reshape = self.reshape(input_sigmoid, (batch_size, -1))
    target = self.reshape(target, (batch_size, -1))
    mask = self.reshape(mask, (batch_size, -1))

    input_mask = input_reshape * mask
    target = target * mask

    a = self.reduce_sum(input_mask * target, 1)
    b = self.reduce_sum(input_mask * input_mask, 1) + 0.001
    c = self.reduce_sum(target * target, 1) + 0.001
    d = (2 * a) / (b + c)
    dice_loss = self.reduce_mean(d)
    return 1 - dice_loss
mindocr.losses.det_loss.PSEDiceLoss.ohem_batch(scores, gt_texts, training_masks)

:param scores: [N * H * W] :param gt_texts: [N * H * W] :param training_masks: [N * H * W] :return: [N * H * W]

Source code in mindocr\losses\det_loss.py
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def ohem_batch(self, scores, gt_texts, training_masks):
    """

    :param scores: [N * H * W]
    :param gt_texts:  [N * H * W]
    :param training_masks: [N * H * W]
    :return: [N * H * W]
    """
    batch_size = scores.shape[0]
    h, w = scores.shape[1:]
    selected_masks = ()
    for i in range(batch_size):
        score = self.slice(scores, (i, 0, 0), (1, h, w))
        score = self.reshape(score, (h, w))

        gt_text = self.slice(gt_texts, (i, 0, 0), (1, h, w))
        gt_text = self.reshape(gt_text, (h, w))

        training_mask = self.slice(training_masks, (i, 0, 0), (1, h, w))
        training_mask = self.reshape(training_mask, (h, w))

        selected_mask = self.ohem_single(score, gt_text, training_mask)
        selected_masks = selected_masks + (selected_mask,)

    selected_masks = self.concat(selected_masks)
    return selected_masks
mindocr.losses.rec_loss
mindocr.losses.rec_loss.CTCLoss

Bases: LossBase

CTCLoss definition

PARAMETER DESCRIPTION
pred_seq_len(int)

the length of the predicted character sequence. For text images, this value equals to W - the width of feature map encoded by the visual bacbkone. This can be obtained by probing the output shape in the network. E.g., for a training image in shape (3, 32, 100), the feature map encoded by resnet34 bacbkone is in shape (512, 1, 4), W = 4, sequence len is 4.

max_label_len(int)

the maximum number of characters in a text label, i.e. max_text_len in yaml.

batch_size(int)

batch size of input logits. bs

Source code in mindocr\losses\rec_loss.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
class CTCLoss(LossBase):
    """
    CTCLoss definition

    Args:
        pred_seq_len(int): the length of the predicted character sequence. For text images, this value equals to
            W - the width of feature map encoded by the visual bacbkone.
            This can be obtained by probing the output shape in the network.
            E.g., for a training image in shape (3, 32, 100), the feature map encoded by resnet34 bacbkone is
            in shape (512, 1, 4), W = 4, sequence len is 4.
        max_label_len(int): the maximum number of characters in a text label, i.e. max_text_len in yaml.
        batch_size(int): batch size of input logits. bs
    """

    def __init__(
        self, pred_seq_len: int = 26, max_label_len: int = 25, batch_size: int = 32, reduction: str = "mean"
    ) -> None:
        super(CTCLoss, self).__init__(reduction=reduction)
        assert pred_seq_len > max_label_len, (
            "pred_seq_len is required to be larger than max_label_len for CTCLoss. Please adjust the strides in the "
            "backbone, or reduce max_text_length in yaml"
        )
        self.sequence_length = Tensor(np.array([pred_seq_len] * batch_size), ms.int32)

        label_indices = []
        for i in range(batch_size):
            for j in range(max_label_len):
                label_indices.append([i, j])
        self.label_indices = Tensor(np.array(label_indices), ms.int64)
        self.ctc_loss = ops.CTCLoss(ctc_merge_repeated=True)

    def construct(self, pred: Tensor, label: Tensor) -> Tensor:
        """
        Args:
            pred (Tensor): network prediction which is a
                logit Tensor in shape (W, BS, NC), where W - seq len, BS - batch size. NC - num of classes
                (types of character + blank + 1)
            label (Tensor): GT sequence of character indices in shape (BS, SL), SL - sequence length, which is padded to
                max_text_length
        Returns:
            loss value (Tensor)
        """
        logit = pred
        label_values = ops.reshape(label, (-1,))

        loss, _ = self.ctc_loss(logit, self.label_indices, label_values, self.sequence_length)
        loss = self.get_loss(loss)
        return loss
mindocr.losses.rec_loss.CTCLoss.construct(pred, label)
PARAMETER DESCRIPTION
pred

network prediction which is a logit Tensor in shape (W, BS, NC), where W - seq len, BS - batch size. NC - num of classes (types of character + blank + 1)

TYPE: Tensor

label

GT sequence of character indices in shape (BS, SL), SL - sequence length, which is padded to max_text_length

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

loss value (Tensor)

Source code in mindocr\losses\rec_loss.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def construct(self, pred: Tensor, label: Tensor) -> Tensor:
    """
    Args:
        pred (Tensor): network prediction which is a
            logit Tensor in shape (W, BS, NC), where W - seq len, BS - batch size. NC - num of classes
            (types of character + blank + 1)
        label (Tensor): GT sequence of character indices in shape (BS, SL), SL - sequence length, which is padded to
            max_text_length
    Returns:
        loss value (Tensor)
    """
    logit = pred
    label_values = ops.reshape(label, (-1,))

    loss, _ = self.ctc_loss(logit, self.label_indices, label_values, self.sequence_length)
    loss = self.get_loss(loss)
    return loss

mindocr.metrics

mindocr.metrics.build_metric(config, device_num=1, **kwargs)

Create the metric function.

PARAMETER DESCRIPTION
config

configuration for metric including metric name and also the kwargs specifically for each metric. - name (str): metric function name, exactly the same as one of the supported metric class names

TYPE: dict

device_num

number of devices. If device_num > 1, metric will be computed in distributed mode, i.e., aggregate intermediate variables (e.g., num_correct, TP) from all devices by ops.AllReduce op so as to correctly compute the metric on dispatched data.

TYPE: int DEFAULT: 1

Return

nn.Metric

Example
Create a RecMetric module for text recognition

from mindocr.metrics import build_metric metric_config = {"name": "RecMetric", "main_indicator": "acc", "character_dict_path": None, "ignore_space": True, "print_flag": False} metric = build_metric(metric_config) metric

Source code in mindocr\metrics\builder.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def build_metric(config, device_num=1, **kwargs):
    """
    Create the metric function.

    Args:
        config (dict): configuration for metric including metric `name` and also the kwargs specifically for
            each metric.
            - name (str): metric function name, exactly the same as one of the supported metric class names
        device_num (int): number of devices. If device_num > 1, metric will be computed in distributed mode,
            i.e., aggregate intermediate variables (e.g., num_correct, TP) from all devices
            by `ops.AllReduce` op so as to correctly
            compute the metric on dispatched data.

    Return:
        nn.Metric

    Example:
        >>> # Create a RecMetric module for text recognition
        >>> from mindocr.metrics import build_metric
        >>> metric_config = {"name": "RecMetric", "main_indicator": "acc", "character_dict_path": None,
        "ignore_space": True, "print_flag": False}
        >>> metric = build_metric(metric_config)
        >>> metric
        <mindocr.metrics.rec_metrics.RecMetric>
    """

    mn = config.pop("name")
    if mn in supported_metrics:
        device_num = 1 if device_num is None else device_num
        config.update({"device_num": device_num})
        metric = eval(mn)(**config)
    else:
        raise ValueError(f"Invalid metric name {mn}, support metrics are {supported_metrics}")

    return metric
mindocr.metrics.builder
mindocr.metrics.builder.build_metric(config, device_num=1, **kwargs)

Create the metric function.

PARAMETER DESCRIPTION
config

configuration for metric including metric name and also the kwargs specifically for each metric. - name (str): metric function name, exactly the same as one of the supported metric class names

TYPE: dict

device_num

number of devices. If device_num > 1, metric will be computed in distributed mode, i.e., aggregate intermediate variables (e.g., num_correct, TP) from all devices by ops.AllReduce op so as to correctly compute the metric on dispatched data.

TYPE: int DEFAULT: 1

Return

nn.Metric

Example
Create a RecMetric module for text recognition

from mindocr.metrics import build_metric metric_config = {"name": "RecMetric", "main_indicator": "acc", "character_dict_path": None, "ignore_space": True, "print_flag": False} metric = build_metric(metric_config) metric

Source code in mindocr\metrics\builder.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def build_metric(config, device_num=1, **kwargs):
    """
    Create the metric function.

    Args:
        config (dict): configuration for metric including metric `name` and also the kwargs specifically for
            each metric.
            - name (str): metric function name, exactly the same as one of the supported metric class names
        device_num (int): number of devices. If device_num > 1, metric will be computed in distributed mode,
            i.e., aggregate intermediate variables (e.g., num_correct, TP) from all devices
            by `ops.AllReduce` op so as to correctly
            compute the metric on dispatched data.

    Return:
        nn.Metric

    Example:
        >>> # Create a RecMetric module for text recognition
        >>> from mindocr.metrics import build_metric
        >>> metric_config = {"name": "RecMetric", "main_indicator": "acc", "character_dict_path": None,
        "ignore_space": True, "print_flag": False}
        >>> metric = build_metric(metric_config)
        >>> metric
        <mindocr.metrics.rec_metrics.RecMetric>
    """

    mn = config.pop("name")
    if mn in supported_metrics:
        device_num = 1 if device_num is None else device_num
        config.update({"device_num": device_num})
        metric = eval(mn)(**config)
    else:
        raise ValueError(f"Invalid metric name {mn}, support metrics are {supported_metrics}")

    return metric
mindocr.metrics.cls_metrics
mindocr.metrics.cls_metrics.ClsMetric

Bases: object

Compute the text direction classification accuracy.

Source code in mindocr\metrics\cls_metrics.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class ClsMetric(object):
    """Compute the text direction classification accuracy."""

    def __init__(self, label_list=None, **kwargs):
        """
        label_list: Set in yaml config file, map the gts back to original label format (angle).
        """
        assert (
            label_list is not None
        ), "`label_list` should not be None. Please set it in 'metric' section in yaml config file."
        self.label_list = label_list
        self.eps = 1e-5
        self.metric_names = ["acc"]
        self.clear()

    def update(self, *inputs):
        preds, gts = inputs
        preds = preds["angles"]
        if isinstance(gts, list):
            gts = gts[0]
        gts = [self.label_list[i] for i in gts]

        correct_num = 0
        all_num = 0
        for pred, target in zip(preds, gts):
            if pred == target:
                correct_num += 1
            all_num += 1
        self.correct_num += correct_num
        self.all_num += all_num

    def eval(self):
        acc = self.correct_num / (self.all_num + self.eps)
        self.clear()
        return {"acc": acc}

    def clear(self):
        self.correct_num = 0
        self.all_num = 0
mindocr.metrics.cls_metrics.ClsMetric.__init__(label_list=None, **kwargs)
Source code in mindocr\metrics\cls_metrics.py
 7
 8
 9
10
11
12
13
14
15
16
17
def __init__(self, label_list=None, **kwargs):
    """
    label_list: Set in yaml config file, map the gts back to original label format (angle).
    """
    assert (
        label_list is not None
    ), "`label_list` should not be None. Please set it in 'metric' section in yaml config file."
    self.label_list = label_list
    self.eps = 1e-5
    self.metric_names = ["acc"]
    self.clear()
mindocr.metrics.det_metrics
mindocr.metrics.det_metrics.DetMetric

Bases: nn.Metric

Source code in mindocr\metrics\det_metrics.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
class DetMetric(nn.Metric):
    """ """

    def __init__(self, device_num=1, **kwargs):
        super().__init__()
        self._evaluator = DetectionIoUEvaluator()
        self._gt_labels, self._det_labels = [], []
        self.device_num = device_num
        self.all_reduce = None if device_num == 1 else ops.AllReduce()
        self.metric_names = ["recall", "precision", "f-score"]

    def clear(self):
        self._gt_labels, self._det_labels = [], []

    def update(self, *inputs):
        """
        compute metric on a batch of data

        Args:
            inputs (tuple): contain two elements preds, gt
                    preds (dict): text detection prediction as a dictionary with keys:
                        polys: np.ndarray of shape (N, K, 4, 2)
                        score: np.ndarray of shape (N, K), confidence score
                    gts (tuple): ground truth
                        - (polygons, ignore_tags), where polygons are in shape [num_images, num_boxes, 4, 2],
                        ignore_tags are in shape [num_images, num_boxes], which can be defined by output_columns in yaml
        """
        preds, gts = inputs
        preds = preds["polys"]
        polys, ignore = gts[0].asnumpy().astype(np.float32), gts[1].asnumpy()

        for sample_id in range(len(polys)):
            gt = [{"polys": poly, "ignore": ig} for poly, ig in zip(polys[sample_id], ignore[sample_id])]
            gt_label, det_label = self._evaluator(gt, preds[sample_id])
            self._gt_labels.append(gt_label)
            self._det_labels.append(det_label)

    @ms_function
    def all_reduce_fun(self, x):
        res = self.all_reduce(x)
        return res

    def cal_matrix(self, det_lst, gt_lst):
        tp = np.sum((gt_lst == 1) * (det_lst == 1))
        fn = np.sum((gt_lst == 1) * (det_lst == 0))
        fp = np.sum((gt_lst == 0) * (det_lst == 1))
        return tp, fp, fn

    def eval(self):
        """
        Evaluate by aggregating results from batch update

        Returns: dict, average precision, recall, f1-score of all samples
            precision: precision,
            recall: recall,
            f-score: f-score
        """
        # flatten predictions and labels into 1D-array
        self._det_labels = np.array([lbl for label in self._det_labels for lbl in label])
        self._gt_labels = np.array([lbl for label in self._gt_labels for lbl in label])

        tp, fp, fn = self.cal_matrix(self._det_labels, self._gt_labels)
        if self.all_reduce:
            tp = float(self.all_reduce_fun(Tensor(tp, ms.float32)).asnumpy())
            fp = float(self.all_reduce_fun(Tensor(fp, ms.float32)).asnumpy())
            fn = float(self.all_reduce_fun(Tensor(fn, ms.float32)).asnumpy())

        recall = _safe_divide(tp, (tp + fn))
        precision = _safe_divide(tp, (tp + fp))
        f_score = _safe_divide(2 * recall * precision, (recall + precision))
        return {"recall": recall, "precision": precision, "f-score": f_score}
mindocr.metrics.det_metrics.DetMetric.eval()

Evaluate by aggregating results from batch update

DICT, AVERAGE PRECISION, RECALL, F1-SCORE OF ALL SAMPLES DESCRIPTION
precision

precision,

recall

recall,

Source code in mindocr\metrics\det_metrics.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def eval(self):
    """
    Evaluate by aggregating results from batch update

    Returns: dict, average precision, recall, f1-score of all samples
        precision: precision,
        recall: recall,
        f-score: f-score
    """
    # flatten predictions and labels into 1D-array
    self._det_labels = np.array([lbl for label in self._det_labels for lbl in label])
    self._gt_labels = np.array([lbl for label in self._gt_labels for lbl in label])

    tp, fp, fn = self.cal_matrix(self._det_labels, self._gt_labels)
    if self.all_reduce:
        tp = float(self.all_reduce_fun(Tensor(tp, ms.float32)).asnumpy())
        fp = float(self.all_reduce_fun(Tensor(fp, ms.float32)).asnumpy())
        fn = float(self.all_reduce_fun(Tensor(fn, ms.float32)).asnumpy())

    recall = _safe_divide(tp, (tp + fn))
    precision = _safe_divide(tp, (tp + fp))
    f_score = _safe_divide(2 * recall * precision, (recall + precision))
    return {"recall": recall, "precision": precision, "f-score": f_score}
mindocr.metrics.det_metrics.DetMetric.update(*inputs)

compute metric on a batch of data

PARAMETER DESCRIPTION
inputs

contain two elements preds, gt preds (dict): text detection prediction as a dictionary with keys: polys: np.ndarray of shape (N, K, 4, 2) score: np.ndarray of shape (N, K), confidence score gts (tuple): ground truth - (polygons, ignore_tags), where polygons are in shape [num_images, num_boxes, 4, 2], ignore_tags are in shape [num_images, num_boxes], which can be defined by output_columns in yaml

TYPE: tuple DEFAULT: ()

Source code in mindocr\metrics\det_metrics.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def update(self, *inputs):
    """
    compute metric on a batch of data

    Args:
        inputs (tuple): contain two elements preds, gt
                preds (dict): text detection prediction as a dictionary with keys:
                    polys: np.ndarray of shape (N, K, 4, 2)
                    score: np.ndarray of shape (N, K), confidence score
                gts (tuple): ground truth
                    - (polygons, ignore_tags), where polygons are in shape [num_images, num_boxes, 4, 2],
                    ignore_tags are in shape [num_images, num_boxes], which can be defined by output_columns in yaml
    """
    preds, gts = inputs
    preds = preds["polys"]
    polys, ignore = gts[0].asnumpy().astype(np.float32), gts[1].asnumpy()

    for sample_id in range(len(polys)):
        gt = [{"polys": poly, "ignore": ig} for poly, ig in zip(polys[sample_id], ignore[sample_id])]
        gt_label, det_label = self._evaluator(gt, preds[sample_id])
        self._gt_labels.append(gt_label)
        self._det_labels.append(det_label)
mindocr.metrics.rec_metrics

Metric for accuracy evaluation.

mindocr.metrics.rec_metrics.RecMetric

Bases: nn.Metric

Define accuracy metric for warpctc network.

PARAMETER DESCRIPTION
ignore_space

remove space in prediction and ground truth text if True

DEFAULT: True

filter_ood

filter out-of-dictionary characters(e.g., '$' for the default digit+en dictionary) in ground truth text. Default is True.

DEFAULT: True

lower

convert GT text to lower case. Recommend to set True if the dictionary does not contains upper letters

DEFAULT: True

Notes

Since the OOD characters are skipped during label encoding in data transformation by default, filter_ood should be True. (Paddle skipped the OOD character in label encoding and then decoded the label indices back to text string, which has no ood character.

Source code in mindocr\metrics\rec_metrics.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
class RecMetric(nn.Metric):
    """
    Define accuracy metric for warpctc network.

    Args:
        ignore_space: remove space in prediction and ground truth text if True
        filter_ood: filter out-of-dictionary characters(e.g., '$' for the default digit+en dictionary) in
            ground truth text. Default is True.
        lower: convert GT text to lower case. Recommend to set True if the dictionary does not contains upper letters

    Notes:
        Since the OOD characters are skipped during label encoding in data transformation by default,
        filter_ood should be True. (Paddle skipped the OOD character in label encoding and then decoded the label
        indices back to text string, which has no ood character.
    """

    def __init__(
        self,
        character_dict_path=None,
        ignore_space=True,
        filter_ood=True,
        lower=True,
        print_flag=False,
        device_num=1,
        **kwargs
    ):
        super().__init__()
        self.clear()
        self.ignore_space = ignore_space
        self.filter_ood = filter_ood
        self.lower = lower
        self.print_flag = print_flag

        self.device_num = device_num
        self.all_reduce = None if device_num == 1 else ops.AllReduce()
        self.metric_names = ["acc", "norm_edit_distance"]

        # TODO: use parsed dictionary object
        if character_dict_path is None:
            self.dict = [c for c in "0123456789abcdefghijklmnopqrstuvwxyz"]
        else:
            self.dict = []
            with open(character_dict_path, "r") as f:
                for line in f:
                    c = line.rstrip("\n\r")
                    self.dict.append(c)

    def clear(self):
        self._correct_num = ms.Tensor(0, dtype=ms.int32)
        self._total_num = ms.Tensor(0, dtype=ms.float32)  # avoid int divisor
        self._norm_edit_dis = ms.Tensor(0.0, dtype=ms.float32)

    def update(self, *inputs):
        """
        Updates the internal evaluation result

        Args:
            inputs (tuple): contain two elements preds, gt
                    preds (dict): prediction output by postprocess, keys:
                        - texts, List[str], batch of predicted text strings, shape [BS, ]
                        - confs (optional), List[float], batch of confidence values for the prediction
                    gt (tuple or list): ground truth, order defined by output_columns in eval dataloader.
                        require element:
                        gt_texts, for the grouth truth texts (padded to the fixed length), shape [BS, ]
                        gt_lens (optional), length of original text if padded, shape [BS, ]

        Raises:
            ValueError: If the number of the inputs is not 2.
        """

        if len(inputs) != 2:
            raise ValueError("Length of inputs should be 2")
        preds, gt = inputs
        pred_texts = preds["texts"]
        # pred_confs = preds['confs']
        # print('pred: ', pred_texts, len(pred_texts))

        # remove padded chars in GT
        if isinstance(gt, tuple) or isinstance(gt, list):
            gt_texts = gt[0]  # text string padded
            gt_lens = gt[1]  # text length

            if isinstance(gt_texts, ms.Tensor):
                gt_texts = gt_texts.asnumpy()
                gt_lens = gt_lens.asnumpy()

            gt_texts = [gt_texts[i][:l] for i, l in enumerate(gt_lens)]
        else:
            gt_texts = gt
            if isinstance(gt_texts, ms.Tensor):
                gt_texts = gt_texts.asnumpy()

        # print('2: ', gt_texts)
        for pred, label in zip(pred_texts, gt_texts):
            # print('pred', pred, 'END')
            # print('label ', label, 'END')

            if self.ignore_space:
                pred = pred.replace(" ", "")
                label = label.replace(" ", "")

            if self.lower:  # convert to lower case
                label = label.lower()
                pred = pred.lower()

            if self.filter_ood:  # filter out of dictionary characters
                label = "".join([c for c in label if c in self.dict])

            if self.print_flag:
                print(pred, " :: ", label)

            edit_distance = Levenshtein.normalized_distance(pred, label)
            self._norm_edit_dis += edit_distance
            if pred == label:
                self._correct_num += 1

            self._total_num += 1

    @ms_function
    def all_reduce_fun(self, x):
        res = self.all_reduce(x)
        return res

    def eval(self):
        if self._total_num == 0:
            raise RuntimeError("Accuary can not be calculated, because the number of samples is 0.")
        print("correct num: ", self._correct_num, ", total num: ", self._total_num)

        if self.all_reduce:
            # sum over all devices
            correct_num = self.all_reduce_fun(self._correct_num)
            norm_edit_dis = self.all_reduce_fun(self._norm_edit_dis)
            total_num = self.all_reduce_fun(self._total_num)
        else:
            correct_num = self._correct_num
            norm_edit_dis = self._norm_edit_dis
            total_num = self._total_num

        sequence_accurancy = float((correct_num / total_num).asnumpy())
        norm_edit_distance = float((1 - norm_edit_dis / total_num).asnumpy())

        return {"acc": sequence_accurancy, "norm_edit_distance": norm_edit_distance}
mindocr.metrics.rec_metrics.RecMetric.update(*inputs)

Updates the internal evaluation result

PARAMETER DESCRIPTION
inputs

contain two elements preds, gt preds (dict): prediction output by postprocess, keys: - texts, List[str], batch of predicted text strings, shape [BS, ] - confs (optional), List[float], batch of confidence values for the prediction gt (tuple or list): ground truth, order defined by output_columns in eval dataloader. require element: gt_texts, for the grouth truth texts (padded to the fixed length), shape [BS, ] gt_lens (optional), length of original text if padded, shape [BS, ]

TYPE: tuple DEFAULT: ()

RAISES DESCRIPTION
ValueError

If the number of the inputs is not 2.

Source code in mindocr\metrics\rec_metrics.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def update(self, *inputs):
    """
    Updates the internal evaluation result

    Args:
        inputs (tuple): contain two elements preds, gt
                preds (dict): prediction output by postprocess, keys:
                    - texts, List[str], batch of predicted text strings, shape [BS, ]
                    - confs (optional), List[float], batch of confidence values for the prediction
                gt (tuple or list): ground truth, order defined by output_columns in eval dataloader.
                    require element:
                    gt_texts, for the grouth truth texts (padded to the fixed length), shape [BS, ]
                    gt_lens (optional), length of original text if padded, shape [BS, ]

    Raises:
        ValueError: If the number of the inputs is not 2.
    """

    if len(inputs) != 2:
        raise ValueError("Length of inputs should be 2")
    preds, gt = inputs
    pred_texts = preds["texts"]
    # pred_confs = preds['confs']
    # print('pred: ', pred_texts, len(pred_texts))

    # remove padded chars in GT
    if isinstance(gt, tuple) or isinstance(gt, list):
        gt_texts = gt[0]  # text string padded
        gt_lens = gt[1]  # text length

        if isinstance(gt_texts, ms.Tensor):
            gt_texts = gt_texts.asnumpy()
            gt_lens = gt_lens.asnumpy()

        gt_texts = [gt_texts[i][:l] for i, l in enumerate(gt_lens)]
    else:
        gt_texts = gt
        if isinstance(gt_texts, ms.Tensor):
            gt_texts = gt_texts.asnumpy()

    # print('2: ', gt_texts)
    for pred, label in zip(pred_texts, gt_texts):
        # print('pred', pred, 'END')
        # print('label ', label, 'END')

        if self.ignore_space:
            pred = pred.replace(" ", "")
            label = label.replace(" ", "")

        if self.lower:  # convert to lower case
            label = label.lower()
            pred = pred.lower()

        if self.filter_ood:  # filter out of dictionary characters
            label = "".join([c for c in label if c in self.dict])

        if self.print_flag:
            print(pred, " :: ", label)

        edit_distance = Levenshtein.normalized_distance(pred, label)
        self._norm_edit_dis += edit_distance
        if pred == label:
            self._correct_num += 1

        self._total_num += 1

mindocr.models

mindocr.models.backbones
mindocr.models.backbones.builder
mindocr.models.backbones.builder.build_backbone(name, **kwargs)

Build the backbone network.

PARAMETER DESCRIPTION
name

the backbone name, which can be a registered backbone class name or a registered backbone (function) name.

TYPE: str

kwargs

input args for the backbone 1) if name is in the registered backbones (e.g. det_resnet50), kwargs include args for backbone creating likes pretrained 2) if name is in the registered backbones class (e.g. DetResNet50), kwargs include args for the backbone configuration like layers. - pretrained: can be bool or str. If bool, load model weights from default url defined in the backbone py file. If str, pretrained can be url or local path to a checkpoint.

TYPE: dict DEFAULT: {}

Return

nn.Cell for backbone module

Construct
Example
build using backbone function name

from mindocr.models.backbones import build_backbone backbone = build_backbone('det_resnet50', pretrained=True)

build using backbone class name

from mindocr.models.backbones.mindcv_models.resnet import Bottleneck cfg_from_class = dict(name='DetResNet', Bottleneck, layers=[3,4,6,3]) backbone = build_backbone(**cfg_from_class) print(backbone)

Source code in mindocr\models\backbones\builder.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def build_backbone(name, **kwargs):
    """
    Build the backbone network.

    Args:
        name (str): the backbone name, which can be a registered backbone class name
                        or a registered backbone (function) name.
        kwargs (dict): input args for the backbone
           1) if `name` is in the registered backbones (e.g. det_resnet50), kwargs include args for backbone creating
           likes `pretrained`
           2) if `name` is in the registered backbones class (e.g. DetResNet50), kwargs include args for the backbone
           configuration like `layers`.
           - pretrained: can be bool or str. If bool, load model weights from default url defined in the backbone py
           file. If str, pretrained can be url or local path to a checkpoint.


    Return:
        nn.Cell for backbone module

    Construct:
        Input: Tensor
        Output: List[Tensor]

    Example:
        >>> # build using backbone function name
        >>> from mindocr.models.backbones import build_backbone
        >>> backbone = build_backbone('det_resnet50', pretrained=True)
        >>> # build using backbone class name
        >>> from mindocr.models.backbones.mindcv_models.resnet import Bottleneck
        >>> cfg_from_class = dict(name='DetResNet', Bottleneck, layers=[3,4,6,3])
        >>> backbone = build_backbone(**cfg_from_class)
        >>> print(backbone)
    """
    remove_prefix = kwargs.pop("remove_prefix", False)

    if is_backbone(name):
        create_fn = backbone_entrypoint(name)
        backbone = create_fn(**kwargs)
    elif is_backbone_class(name):
        backbone_class = backbone_class_entrypoint(name)
        backbone = backbone_class(**kwargs)
    elif 'mindcv' in name:
        # you can add `feature_only` parameter and `out_indices` in kwargs to extract intermediate features.
        backbone = MindCVBackboneWrapper(name, **kwargs)
    else:
        raise ValueError(f'Invalid backbone name: {name}, supported backbones are: {list_backbones()}')

    if 'pretrained' in kwargs:
        pretrained = kwargs['pretrained']
        if not isinstance(pretrained, bool):
            if remove_prefix:
                # remove the prefix with `backbone.`
                def fn(x): return {k.replace('backbone.', ''): v for k, v in x.items()}
            else:
                fn = None
            load_model(backbone, pretrained, filter_fn=fn)
        # No need to load again if pretrained is bool and True, because pretrained backbone is already loaded
        # in the backbone definition function.')

    return backbone
mindocr.models.backbones.cls_mobilenet_v3
mindocr.models.backbones.cls_mobilenet_v3.cls_mobilenet_v3_small_100(pretrained=True, in_channels=3, **kwargs)

Get small MobileNetV3 model without width scaling.

Source code in mindocr\models\backbones\cls_mobilenet_v3.py
26
27
28
29
30
31
32
33
34
35
36
37
@register_backbone
def cls_mobilenet_v3_small_100(pretrained: bool = True, in_channels: int = 3, **kwargs):
    """Get small MobileNetV3 model without width scaling.
    """
    model = ClsMobileNetV3(arch="small", alpha=1.0, in_channels=in_channels, **kwargs)

    # load pretrained weights
    if pretrained:
        default_cfg = default_cfgs['mobilenet_v3_small_1.0']
        load_pretrained(model, default_cfg)

    return model
mindocr.models.backbones.det_mobilenet
mindocr.models.backbones.det_resnet
mindocr.models.backbones.mindcv_models

models init

mindocr.models.backbones.mindcv_models.bit

MindSpore implementation of BiT_ResNet. Refer to Big Transfer (BiT): General Visual Representation Learning.

mindocr.models.backbones.mindcv_models.bit.BiT_ResNet

Bases: nn.Cell

BiT_ResNet model class, based on "Big Transfer (BiT): General Visual Representation Learning" <https://arxiv.org/abs/1912.11370>_

PARAMETER DESCRIPTION
block(Union[Bottleneck])

block of BiT_ResNetv2.

layers(tuple(int))

number of layers of each stage.

wf(int)

width of each layer. Default: 1.

num_classes(int)

number of classification classes. Default: 1000.

in_channels(int)

number the channels of the input. Default: 3.

groups(int)

number of groups for group conv in blocks. Default: 1.

base_width(int)

base width of pre group hidden channel in blocks. Default: 64.

norm(nn.Cell)

normalization layer in blocks. Default: None.

Source code in mindocr\models\backbones\mindcv_models\bit.py
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
class BiT_ResNet(nn.Cell):
    r"""BiT_ResNet model class, based on
    `"Big Transfer (BiT): General Visual Representation Learning" <https://arxiv.org/abs/1912.11370>`_
    Args:
        block(Union[Bottleneck]): block of BiT_ResNetv2.
        layers(tuple(int)): number of layers of each stage.
        wf(int): width of each layer. Default: 1.
        num_classes(int): number of classification classes. Default: 1000.
        in_channels(int): number the channels of the input. Default: 3.
        groups(int): number of groups for group conv in blocks. Default: 1.
        base_width(int): base width of pre group hidden channel in blocks. Default: 64.
        norm(nn.Cell): normalization layer in blocks. Default: None.
    """

    def __init__(
        self,
        block: Type[Union[Bottleneck]],
        layers: List[int],
        wf: int = 1,
        num_classes: int = 1000,
        in_channels: int = 3,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()

        if norm is None:
            norm = nn.GroupNorm

        self.norm: nn.Cell = norm  # add type hints to make pylint happy
        self.input_channels = 64 * wf
        self.groups = groups
        self.base_with = base_width

        self.conv1 = StdConv2d(in_channels, self.input_channels, kernel_size=7,
                               stride=2, pad_mode="pad", padding=3)
        self.pad = nn.ConstantPad2d(1, 0)
        self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")

        self.layer1 = self._make_layer(block, 64 * wf, layers[0])
        self.layer2 = self._make_layer(block, 128 * wf, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256 * wf, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512 * wf, layers[3], stride=2)

        self.gn = norm(32, 2048 * wf)
        self.relu = nn.ReLU()
        self.pool = GlobalAvgPooling(keep_dims=True)
        self.classifier = nn.Conv2d(512 * block.expansion * wf, num_classes, kernel_size=1, has_bias=True)

    def _make_layer(
        self,
        block: Type[Union[Bottleneck]],
        channels: int,
        block_nums: int,
        stride: int = 1,
    ) -> nn.SequentialCell:
        """build model depending on cfgs"""
        down_sample = None

        if stride != 1 or self.input_channels != channels * block.expansion:
            down_sample = nn.SequentialCell([
                StdConv2d(self.input_channels, channels * block.expansion, kernel_size=1, stride=stride),
            ])

        layers = []
        layers.append(
            block(
                self.input_channels,
                channels,
                stride=stride,
                down_sample=down_sample,
                groups=self.groups,
                base_width=self.base_with,
                norm=self.norm,
            )
        )
        self.input_channels = channels * block.expansion

        for _ in range(1, block_nums):
            layers.append(
                block(
                    self.input_channels,
                    channels,
                    groups=self.groups,
                    base_width=self.base_with,
                    norm=self.norm,
                )
            )

        return nn.SequentialCell(layers)

    def root(self, x: Tensor) -> Tensor:
        x = self.conv1(x)
        x = self.pad(x)
        x = self.max_pool(x)
        return x

    def forward_features(self, x: Tensor) -> Tensor:
        """Network forward feature extraction."""
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.gn(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.root(x)
        x = self.forward_features(x)
        x = self.forward_head(x)
        assert x.shape[-2:] == (1, 1)  # We should have no spatial shape left.
        return x[..., 0, 0]
mindocr.models.backbones.mindcv_models.bit.BiT_ResNet.forward_features(x)

Network forward feature extraction.

Source code in mindocr\models\backbones\mindcv_models\bit.py
247
248
249
250
251
252
253
def forward_features(self, x: Tensor) -> Tensor:
    """Network forward feature extraction."""
    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    return x
mindocr.models.backbones.mindcv_models.bit.Bottleneck

Bases: nn.Cell

define the basic block of BiT

PARAMETER DESCRIPTION
in_channels(int)

The channel number of the input tensor of the Conv2d layer.

channels(int)

The channel number of the output tensor of the middle Conv2d layer.

stride(int)

The movement stride of the 2D convolution kernel. Default: 1.

groups(int)

Number of groups for group conv in blocks. Default: 1.

base_width(int)

Base width of pre group hidden channel in blocks. Default: 64.

norm(nn.Cell)

Normalization layer in blocks. Default: None.

down_sample(nn.Cell)

Down sample in blocks. Default: None.

Source code in mindocr\models\backbones\mindcv_models\bit.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
class Bottleneck(nn.Cell):
    """define the basic block of BiT
    Args:
          in_channels(int): The channel number of the input tensor of the Conv2d layer.
          channels(int): The channel number of the output tensor of the middle Conv2d layer.
          stride(int): The movement stride of the 2D convolution kernel. Default: 1.
          groups(int): Number of groups for group conv in blocks. Default: 1.
          base_width(int): Base width of pre group hidden channel in blocks. Default: 64.
          norm(nn.Cell): Normalization layer in blocks. Default: None.
          down_sample(nn.Cell): Down sample in blocks. Default: None.
    """

    expansion: int = 4

    def __init__(
        self,
        in_channels: int,
        channels: int,
        stride: int = 1,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        down_sample: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        if norm is None:
            norm = nn.GroupNorm

        width = int(channels * (base_width / 64.0)) * groups
        self.gn1 = norm(32, in_channels)
        self.conv1 = StdConv2d(in_channels, width, kernel_size=1, stride=1)
        self.gn2 = norm(32, width)
        self.conv2 = StdConv2d(width, width, kernel_size=3, stride=stride,
                               padding=1, pad_mode="pad", group=groups)
        self.gn3 = norm(32, width)
        self.conv3 = StdConv2d(width, channels * self.expansion,
                               kernel_size=1, stride=1)

        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x
        out = self.gn1(x)
        out = self.relu(out)

        residual = out

        out = self.conv1(out)

        out = self.gn2(out)
        out = self.relu(out)
        out = self.conv2(out)

        out = self.gn3(out)
        out = self.relu(out)
        out = self.conv3(out)

        if self.down_sample is not None:
            identity = self.down_sample(residual)

        out += identity
        # out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.bit.StdConv2d

Bases: nn.Conv2d

Conv2d with Weight Standardization

PARAMETER DESCRIPTION
in_channels(int)

The channel number of the input tensor of the Conv2d layer.

out_channels(int)

The channel number of the output tensor of the Conv2d layer.

kernel_size(int)

Specifies the height and width of the 2D convolution kernel.

stride(int)

The movement stride of the 2D convolution kernel. Default: 1.

pad_mode(str)

Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same".

padding(int)

The number of padding on the height and width directions of the input. Default: 0.

group(int)

Splits filter into groups. Default: 1.

Source code in mindocr\models\backbones\mindcv_models\bit.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class StdConv2d(nn.Conv2d):
    r"""Conv2d with Weight Standardization
    Args:
        in_channels(int): The channel number of the input tensor of the Conv2d layer.
        out_channels(int): The channel number of the output tensor of the Conv2d layer.
        kernel_size(int): Specifies the height and width of the 2D convolution kernel.
        stride(int): The movement stride of the 2D convolution kernel. Default: 1.
        pad_mode(str): Specifies padding mode. The optional values are "same", "valid", "pad". Default: "same".
        padding(int): The number of padding on the height and width directions of the input. Default: 0.
        group(int): Splits filter into groups. Default: 1.
    """

    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size,
        stride=1,
        pad_mode="same",
        padding=0,
        group=1,
    ) -> None:
        super(StdConv2d, self).__init__(
            in_channels,
            out_channels,
            kernel_size,
            stride,
            pad_mode,
            padding,
            group,
        )
        self.mean_op = ops.ReduceMean(keep_dims=True)

    def construct(self, x):
        w = self.weight
        m = self.mean_op(w, [1, 2, 3])
        v = w.var((1, 2, 3), keepdims=True)
        w = (w - m) / mindspore.ops.sqrt(v + 1e-10)
        output = self.conv2d(x, w)
        return output
mindocr.models.backbones.mindcv_models.bit.BiTresnet101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 101 layers ResNet model. Refer to the base class models.BiT_Resnet for more details.

Source code in mindocr\models\backbones\mindcv_models\bit.py
298
299
300
301
302
303
304
305
306
307
308
309
@register_model
def BiTresnet101(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 101 layers ResNet model.
    Refer to the base class `models.BiT_Resnet` for more details.
    """
    default_cfg = default_cfgs["BiTresnet101"]
    model = BiT_ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.bit.BiTresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers ResNet model. Refer to the base class models.BiT_Resnet for more details.

Source code in mindocr\models\backbones\mindcv_models\bit.py
270
271
272
273
274
275
276
277
278
279
280
281
@register_model
def BiTresnet50(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 50 layers ResNet model.
    Refer to the base class `models.BiT_Resnet` for more details.
    """
    default_cfg = default_cfgs["BiTresnet50"]
    model = BiT_ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.bit.BiTresnet50x3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers ResNet model. Refer to the base class models.BiT_Resnet for more details.

Source code in mindocr\models\backbones\mindcv_models\bit.py
284
285
286
287
288
289
290
291
292
293
294
295
@register_model
def BiTresnet50x3(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 50 layers ResNet model.
     Refer to the base class `models.BiT_Resnet` for more details.
     """
    default_cfg = default_cfgs["BiTresnet50x3"]
    model = BiT_ResNet(Bottleneck, [3, 4, 6, 3], wf=3, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.cait

MindSpore implementation of CaiT. Refer to Going deeper with Image Transformers.

mindocr.models.backbones.mindcv_models.cait.AttentionTalkingHead

Bases: nn.Cell

Talking head is a trick for multi-head attention, which has two more linear map before and after the softmax compared to normal attention.

Source code in mindocr\models\backbones\mindcv_models\cait.py
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
class AttentionTalkingHead(nn.Cell):
    """
    Talking head is a trick for multi-head attention,
    which has two more linear map before and after
    the softmax compared to normal attention.
    """
    def __init__(self,
                 dim: int,
                 num_heads: int = 8,
                 qkv_bias: bool = False,
                 qk_scale: float = None,
                 attn_drop_rate: float = 0.,
                 proj_drop_rate: float = 0.) -> None:
        super(AttentionTalkingHead, self).__init__()
        assert dim % num_heads == 0, "dim should be divisible by num_heads."
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = qk_scale or head_dim ** -0.5

        self.qkv = nn.Dense(dim, dim * 3, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(1 - attn_drop_rate)

        self.proj = nn.Dense(dim, dim, has_bias=False)

        self.proj_l = nn.Dense(num_heads, num_heads, has_bias=False)
        self.proj_w = nn.Dense(num_heads, num_heads, has_bias=False)

        self.proj_drop = nn.Dropout(1 - proj_drop_rate)

        self.softmax = nn.Softmax(axis=-1)

        self.attn_matmul_v = ops.BatchMatMul()
        self.q_matmul_k = ops.BatchMatMul(transpose_b=True)

    def construct(self, x) -> Tensor:
        B, N, C = x.shape
        qkv = ops.reshape(self.qkv(x), (B, N, 3, self.num_heads, C // self.num_heads))
        qkv = ops.transpose(qkv, (2, 0, 3, 1, 4))
        q, k, v = ops.unstack(qkv, axis=0)
        q = ops.mul(q, self.scale)

        attn = self.q_matmul_k(q, k)

        attn = ops.transpose(attn, (0, 2, 3, 1))
        attn = self.proj_l(attn)
        attn = ops.transpose(attn, (0, 3, 1, 2))
        attn = self.softmax(attn)
        attn = ops.transpose(attn, (0, 2, 3, 1))
        attn = self.proj_w(attn)
        attn = ops.transpose(attn, (0, 3, 1, 2))

        attn = self.attn_drop(attn)

        x = self.attn_matmul_v(attn, v)
        x = ops.transpose(x, (0, 2, 1, 3))
        x = ops.reshape(x, (B, N, C))
        x = self.proj(x)
        x = self.proj_drop(x)

        return x
mindocr.models.backbones.mindcv_models.coat

CoaT architecture. Modified from timm/models/vision_transformer.py

mindocr.models.backbones.mindcv_models.coat.CoaT

Bases: nn.Cell

CoaT class.

Source code in mindocr\models\backbones\mindcv_models\coat.py
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
class CoaT(nn.Cell):
    """ CoaT class. """

    def __init__(
        self,
        image_size=224,
        patch_size=16,
        in_chans=3,
        num_classes=1000,
        embed_dims=[0, 0, 0, 0],
        serial_depths=[0, 0, 0, 0],
        parallel_depth=0,
        num_heads=0,
        mlp_ratios=[0, 0, 0, 0],
        qkv_bias=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.,
        return_interm_layers=False,
        out_features=None,
        crpe_window={3: 2, 5: 3, 7: 3},
        **kwargs
    ) -> None:
        super().__init__()
        self.return_interm_layers = return_interm_layers
        self.out_features = out_features
        self.num_classes = num_classes

        self.patch_embed1 = PatchEmbed(image_size=image_size, patch_size=patch_size,
                                       in_chans=in_chans, embed_dim=embed_dims[0])
        self.patch_embed2 = PatchEmbed(image_size=image_size // (2**2), patch_size=2,
                                       in_chans=embed_dims[0], embed_dim=embed_dims[1])
        self.patch_embed3 = PatchEmbed(image_size=image_size // (2**3), patch_size=2,
                                       in_chans=embed_dims[1], embed_dim=embed_dims[2])
        self.patch_embed4 = PatchEmbed(image_size=image_size // (2**4), patch_size=2,
                                       in_chans=embed_dims[2], embed_dim=embed_dims[3])

        self.cls_token1 = mindspore.Parameter(ops.Zeros()((1, 1, embed_dims[0]), mindspore.float32))
        self.cls_token2 = mindspore.Parameter(ops.Zeros()((1, 1, embed_dims[1]), mindspore.float32))
        self.cls_token3 = mindspore.Parameter(ops.Zeros()((1, 1, embed_dims[2]), mindspore.float32))
        self.cls_token4 = mindspore.Parameter(ops.Zeros()((1, 1, embed_dims[3]), mindspore.float32))

        self.cpe1 = ConvPosEnc(dim=embed_dims[0], k=3)
        self.cpe2 = ConvPosEnc(dim=embed_dims[1], k=3)
        self.cpe3 = ConvPosEnc(dim=embed_dims[2], k=3)
        self.cpe4 = ConvPosEnc(dim=embed_dims[3], k=3)

        self.crpe1 = ConvRelPosEnc(Ch=embed_dims[0] // num_heads, h=num_heads, window=crpe_window)
        self.crpe2 = ConvRelPosEnc(Ch=embed_dims[1] // num_heads, h=num_heads, window=crpe_window)
        self.crpe3 = ConvRelPosEnc(Ch=embed_dims[2] // num_heads, h=num_heads, window=crpe_window)
        self.crpe4 = ConvRelPosEnc(Ch=embed_dims[3] // num_heads, h=num_heads, window=crpe_window)

        dpr = drop_path_rate

        self.serial_blocks1 = nn.CellList([
            SerialBlock(
                dim=embed_dims[0], num_heads=num_heads, mlp_ratio=mlp_ratios[0], qkv_bias=qkv_bias,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr,
                shared_cpe=self.cpe1, shared_crpe=self.crpe1
            )
            for _ in range(serial_depths[0])]
        )

        self.serial_blocks2 = nn.CellList([
            SerialBlock(
                dim=embed_dims[1], num_heads=num_heads, mlp_ratio=mlp_ratios[1], qkv_bias=qkv_bias,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr,
                shared_cpe=self.cpe2, shared_crpe=self.crpe2
            )
            for _ in range(serial_depths[1])]
        )

        self.serial_blocks3 = nn.CellList([
            SerialBlock(
                dim=embed_dims[2], num_heads=num_heads, mlp_ratio=mlp_ratios[2], qkv_bias=qkv_bias,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr,
                shared_cpe=self.cpe3, shared_crpe=self.crpe3
            )
            for _ in range(serial_depths[2])]
        )

        self.serial_blocks4 = nn.CellList([
            SerialBlock(
                dim=embed_dims[3], num_heads=num_heads, mlp_ratio=mlp_ratios[3], qkv_bias=qkv_bias,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr,
                shared_cpe=self.cpe4, shared_crpe=self.crpe4
            )
            for _ in range(serial_depths[3])]
        )

        self.parallel_depth = parallel_depth
        if self.parallel_depth > 0:
            self.parallel_blocks = nn.CellList([
                ParallelBlock(dims=embed_dims,
                              num_heads=num_heads,
                              mlp_ratios=mlp_ratios,
                              qkv_bias=qkv_bias,
                              drop=drop_rate,
                              attn_drop=attn_drop_rate,
                              drop_path=dpr,
                              shared_cpes=[self.cpe1, self.cpe2, self.cpe3, self.cpe4],
                              shared_crpes=[self.crpe1, self.crpe2, self.crpe3, self.crpe4]
                              )
                for _ in range(parallel_depth)]
            )
        else:
            self.parallel_blocks = None

        if not self.return_interm_layers:
            if self.parallel_blocks is not None:
                self.norm2 = nn.LayerNorm((embed_dims[1],), epsilon=1e-6)
                self.norm3 = nn.LayerNorm((embed_dims[2],), epsilon=1e-6)
            else:
                self.norm2 = None
                self.norm3 = None

            self.norm4 = nn.LayerNorm((embed_dims[3],), epsilon=1e-6)

            if self.parallel_depth > 0:
                self.aggregate = nn.Conv1d(in_channels=3,
                                           out_channels=1,
                                           kernel_size=1,
                                           has_bias=True)
                self.head = nn.Dense(embed_dims[3], num_classes) if num_classes > 0 else Identity()
            else:
                self.aggregate = None
                self.head = nn.Dense(embed_dims[3], num_classes) if num_classes > 0 else Identity()

        self.cls_token1.set_data(init.initializer(init.TruncatedNormal(sigma=.02), self.cls_token1.data.shape))
        self.cls_token2.set_data(init.initializer(init.TruncatedNormal(sigma=.02), self.cls_token2.data.shape))
        self.cls_token3.set_data(init.initializer(init.TruncatedNormal(sigma=.02), self.cls_token3.data.shape))
        self.cls_token4.set_data(init.initializer(init.TruncatedNormal(sigma=.02), self.cls_token4.data.shape))
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(init.initializer(init.TruncatedNormal(sigma=.02), cell.weight.data.shape))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Constant(0), cell.bias.shape))
            elif isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(init.initializer(init.Constant(1.0), cell.gamma.shape))
                cell.beta.set_data(init.initializer(init.Constant(0), cell.beta.shape))

    def insert_cls(self, x, cls_token) -> Tensor:
        t0 = x.shape[0]
        t1 = cls_token.shape[1]
        t2 = cls_token.shape[2]
        y = Tensor(np.ones((t0, t1, t2)))
        cls_tokens = cls_token.expand_as(y)

        x = ops.concat((cls_tokens, x), axis=1)
        return x

    def remove_cls(self, x: Tensor) -> Tensor:
        return x[:, 1:, :]

    def forward_features(self, x0: Tensor) -> Union[dict, Tensor]:
        B = x0.shape[0]

        x1 = self.patch_embed1(x0)
        H1, W1 = self.patch_embed1.patches_resolution
        x1 = self.insert_cls(x1, self.cls_token1)
        for blk in self.serial_blocks1:
            x1 = blk(x1, size=(H1, W1))
        x1_nocls = self.remove_cls(x1)
        x1_nocls = ops.reshape(x1_nocls, (B, H1, W1, -1))
        x1_nocls = ops.transpose(x1_nocls, (0, 3, 1, 2))

        x2 = self.patch_embed2(x1_nocls)
        H2, W2 = self.patch_embed2.patches_resolution
        x2 = self.insert_cls(x2, self.cls_token2)
        for blk in self.serial_blocks2:
            x2 = blk(x2, size=(H2, W2))
        x2_nocls = self.remove_cls(x2)
        x2_nocls = ops.reshape(x2_nocls, (B, H2, W2, -1))
        x2_nocls = ops.transpose(x2_nocls, (0, 3, 1, 2))

        x3 = self.patch_embed3(x2_nocls)
        H3, W3 = self.patch_embed3.patches_resolution
        x3 = self.insert_cls(x3, self.cls_token3)
        for blk in self.serial_blocks3:
            x3 = blk(x3, size=(H3, W3))
        x3_nocls = self.remove_cls(x3)
        x3_nocls = ops.reshape(x3_nocls, (B, H3, W3, -1))
        x3_nocls = ops.transpose(x3_nocls, (0, 3, 1, 2))

        x4 = self.patch_embed4(x3_nocls)
        H4, W4 = self.patch_embed4.patches_resolution
        x4 = self.insert_cls(x4, self.cls_token4)
        for blk in self.serial_blocks4:
            x4 = blk(x4, size=(H4, W4))
        x4_nocls = self.remove_cls(x4)
        x4_nocls = ops.reshape(x4_nocls, (B, H4, W4, -1))
        x4_nocls = ops.transpose(x4_nocls, (0, 3, 1, 2))

        if self.parallel_depth == 0:
            if self.return_interm_layers:
                feat_out = {}
                if 'x1_nocls' in self.out_features:
                    feat_out['x1_nocls'] = x1_nocls
                if 'x2_nocls' in self.out_features:
                    feat_out['x2_nocls'] = x2_nocls
                if 'x3_nocls' in self.out_features:
                    feat_out['x3_nocls'] = x3_nocls
                if 'x4_nocls' in self.out_features:
                    feat_out['x4_nocls'] = x4_nocls
                return feat_out
            else:
                x4 = self.norm4(x4)
                x4_cls = x4[:, 0]
                return x4_cls

        for blk in self.parallel_blocks:
            x1, x2, x3, x4 = blk(x1, x2, x3, x4, sizes=[(H1, W1), (H2, W2), (H3, W3), (H4, W4)])

        if self.return_interm_layers:
            feat_out = {}
            if 'x1_nocls' in self.out_features:
                x1_nocls = x1[:, 1:, :].reshape((B, H1, W1, -1)).transpose((0, 3, 1, 2))
                feat_out['x1_nocls'] = x1_nocls
            if 'x2_nocls' in self.out_features:
                x2_nocls = x2[:, 1:, :].reshape((B, H2, W2, -1)).transpose((0, 3, 1, 2))
                feat_out['x2_nocls'] = x2_nocls
            if 'x3_nocls' in self.out_features:
                x3_nocls = x3[:, 1:, :].reshape((B, H3, W3, -1)).transpose((0, 3, 1, 2))
                feat_out['x3_nocls'] = x3_nocls
            if 'x4_nocls' in self.out_features:
                x4_nocls = x4[:, 1:, :].reshape((B, H4, W4, -1)).transpose((0, 3, 1, 2))
                feat_out['x4_nocls'] = x4_nocls
            return feat_out
        else:
            x2 = self.norm2(x2)
            x3 = self.norm3(x3)
            x4 = self.norm4(x4)
            x2_cls = x2[:, :1]
            x3_cls = x3[:, :1]
            x4_cls = x4[:, :1]
            merged_cls = ops.concat((x2_cls, x3_cls, x4_cls), axis=1)
            merged_cls = self.aggregate(merged_cls).squeeze(axis=1)
            return merged_cls

    def construct(self, x: Tensor) -> Union[dict, Tensor]:
        if self.return_interm_layers:
            return self.forward_features(x)
        else:
            x = self.forward_features(x)
            x = self.head(x)
            return x
mindocr.models.backbones.mindcv_models.coat.ConvPosEnc

Bases: nn.Cell

Convolutional Position Encoding. Note: This module is similar to the conditional position encoding in CPVT.

Source code in mindocr\models\backbones\mindcv_models\coat.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
class ConvPosEnc(nn.Cell):
    """ Convolutional Position Encoding.
        Note: This module is similar to the conditional position encoding in CPVT.
    """

    def __init__(
        self,
        dim,
        k=3
    ) -> None:
        super(ConvPosEnc, self).__init__()
        self.proj = nn.Conv2d(in_channels=dim,
                              out_channels=dim,
                              kernel_size=k,
                              stride=1,
                              padding=k // 2,
                              group=dim,
                              pad_mode='pad',
                              has_bias=True)

    def construct(self, x, size) -> Tensor:
        B, N, C = x.shape
        H, W = size

        cls_token, img_tokens = x[:, :1], x[:, 1:]

        feat = ops.transpose(img_tokens, (0, 2, 1))
        feat = ops.reshape(feat, (B, C, H, W))
        x = ops.add(self.proj(feat), feat)

        x = ops.reshape(x, (B, C, H * W))
        x = ops.transpose(x, (0, 2, 1))

        x = ops.concat((cls_token, x), axis=1)
        return x
mindocr.models.backbones.mindcv_models.coat.FactorAtt_ConvRelPosEnc

Bases: nn.Cell

Factorized attention with convolutional relative position encoding class.

Source code in mindocr\models\backbones\mindcv_models\coat.py
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
class FactorAtt_ConvRelPosEnc(nn.Cell):
    """Factorized attention with convolutional relative position encoding class."""

    def __init__(
        self,
        dim,
        num_heads=8,
        qkv_bias=False,
        attn_drop=0.,
        proj_drop=0.,
        shared_crpe=None
    ) -> None:
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = head_dim ** -0.5

        self.q = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.k = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.v = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(keep_prob=1 - attn_drop)
        self.proj = nn.Dense(dim, dim)
        self.proj_drop = nn.Dropout(keep_prob=1 - proj_drop)
        self.softmax = nn.Softmax(axis=-1)
        self.batch_matmul = ops.BatchMatMul()

        self.crpe = shared_crpe

    def construct(self, x, size) -> Tensor:
        B, N, C = x.shape
        q = ops.reshape(self.q(x), (B, N, self.num_heads, C // self.num_heads))
        q = ops.transpose(q, (0, 2, 1, 3))
        k = ops.reshape(self.k(x), (B, N, self.num_heads, C // self.num_heads))
        k = ops.transpose(k, (0, 2, 3, 1))
        v = ops.reshape(self.v(x), (B, N, self.num_heads, C // self.num_heads))
        v = ops.transpose(v, (0, 2, 1, 3))

        k_softmax = self.softmax(k)
        factor_att = self.batch_matmul(q, k_softmax)
        factor_att = self.batch_matmul(factor_att, v)

        crpe = self.crpe(q, v, size=size)

        x = ops.mul(self.scale, factor_att)
        x = ops.add(x, crpe)
        x = ops.transpose(x, (0, 2, 1, 3))
        x = ops.reshape(x, (B, N, C))

        x = self.proj(x)
        x = self.proj_drop(x)
        return x
mindocr.models.backbones.mindcv_models.coat.Mlp

Bases: nn.Cell

MLP Cell

Source code in mindocr\models\backbones\mindcv_models\coat.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class Mlp(nn.Cell):
    """MLP Cell"""

    def __init__(
        self,
        in_features,
        hidden_features=None,
        out_features=None,
        drop=0.0
    ) -> None:
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.fc1 = nn.Dense(in_channels=in_features, out_channels=hidden_features, has_bias=True)
        self.act = nn.GELU(approximate=False)
        self.fc2 = nn.Dense(in_channels=hidden_features, out_channels=out_features, has_bias=True)
        self.drop = nn.Dropout(keep_prob=1.0 - drop)

    def construct(self, x: Tensor) -> Tensor:
        x = self.fc1(x)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x
mindocr.models.backbones.mindcv_models.coat.ParallelBlock

Bases: nn.Cell

Parallel block class.

Source code in mindocr\models\backbones\mindcv_models\coat.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
class ParallelBlock(nn.Cell):
    """ Parallel block class. """

    def __init__(
        self,
        dims,
        num_heads,
        mlp_ratios=[],
        qkv_bias=False,
        drop=0.,
        attn_drop=0.,
        drop_path=0.,
        shared_cpes=None,
        shared_crpes=None
    ) -> None:
        super().__init__()

        self.cpes = shared_cpes

        self.norm12 = nn.LayerNorm((dims[1],), epsilon=1e-6)
        self.norm13 = nn.LayerNorm((dims[2],), epsilon=1e-6)
        self.norm14 = nn.LayerNorm((dims[3],), epsilon=1e-6)
        self.factoratt_crpe2 = FactorAtt_ConvRelPosEnc(dims[1],
                                                       num_heads=num_heads,
                                                       qkv_bias=qkv_bias,
                                                       attn_drop=attn_drop,
                                                       proj_drop=drop,
                                                       shared_crpe=shared_crpes[1]
                                                       )
        self.factoratt_crpe3 = FactorAtt_ConvRelPosEnc(dims[2],
                                                       num_heads=num_heads,
                                                       qkv_bias=qkv_bias,
                                                       attn_drop=attn_drop,
                                                       proj_drop=drop,
                                                       shared_crpe=shared_crpes[2]
                                                       )
        self.factoratt_crpe4 = FactorAtt_ConvRelPosEnc(dims[3],
                                                       num_heads=num_heads,
                                                       qkv_bias=qkv_bias,
                                                       attn_drop=attn_drop,
                                                       proj_drop=drop,
                                                       shared_crpe=shared_crpes[3]
                                                       )
        self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity()

        self.norm22 = nn.LayerNorm((dims[1],), epsilon=1e-6)
        self.norm23 = nn.LayerNorm((dims[2],), epsilon=1e-6)
        self.norm24 = nn.LayerNorm((dims[3],), epsilon=1e-6)

        mlp_hidden_dim = int(dims[1] * mlp_ratios[1])
        self.mlp2 = self.mlp3 = self.mlp4 = Mlp(in_features=dims[1], hidden_features=mlp_hidden_dim, drop=drop)

    def upsample(self, x, output_size, size) -> Tensor:
        """ Feature map up-sampling. """
        return self.interpolate(x, output_size=output_size, size=size)

    def downsample(self, x, output_size, size) -> Tensor:
        """ Feature map down-sampling. """
        return self.interpolate(x, output_size=output_size, size=size)

    def interpolate(self, x, output_size, size) -> Tensor:
        """ Feature map interpolation. """
        B, N, C = x.shape
        H, W = size

        cls_token = x[:, :1, :]
        img_tokens = x[:, 1:, :]

        img_tokens = ops.transpose(img_tokens, (0, 2, 1))
        img_tokens = ops.reshape(img_tokens, (B, C, H, W))
        img_tokens = ops.interpolate(img_tokens,
                                     sizes=output_size,
                                     mode='bilinear'
                                     )
        img_tokens = ops.reshape(img_tokens, (B, C, -1))
        img_tokens = ops.transpose(img_tokens, (0, 2, 1))

        out = ops.concat((cls_token, img_tokens), axis=1)
        return out

    def construct(self, x1, x2, x3, x4, sizes) -> tuple:
        _, (H2, W2), (H3, W3), (H4, W4) = sizes

        # Conv-Attention.
        x2 = self.cpes[1](x2, size=(H2, W2))  # Note: x1 is ignored.
        x3 = self.cpes[2](x3, size=(H3, W3))
        x4 = self.cpes[3](x4, size=(H4, W4))

        cur2 = self.norm12(x2)
        cur3 = self.norm13(x3)
        cur4 = self.norm14(x4)
        cur2 = self.factoratt_crpe2(cur2, size=(H2, W2))
        cur3 = self.factoratt_crpe3(cur3, size=(H3, W3))
        cur4 = self.factoratt_crpe4(cur4, size=(H4, W4))
        upsample3_2 = self.upsample(cur3, output_size=(H2, W2), size=(H3, W3))
        upsample4_3 = self.upsample(cur4, output_size=(H3, W3), size=(H4, W4))
        upsample4_2 = self.upsample(cur4, output_size=(H2, W2), size=(H4, W4))
        downsample2_3 = self.downsample(cur2, output_size=(H3, W3), size=(H2, W2))
        downsample3_4 = self.downsample(cur3, output_size=(H4, W4), size=(H3, W3))
        downsample2_4 = self.downsample(cur2, output_size=(H4, W4), size=(H2, W2))
        cur2 = cur2 + upsample3_2 + upsample4_2
        cur3 = cur3 + upsample4_3 + downsample2_3
        cur4 = cur4 + downsample3_4 + downsample2_4
        x2 = x2 + self.drop_path(cur2)
        x3 = x3 + self.drop_path(cur3)
        x4 = x4 + self.drop_path(cur4)

        cur2 = self.norm22(x2)
        cur3 = self.norm23(x3)
        cur4 = self.norm24(x4)
        cur2 = self.mlp2(cur2)
        cur3 = self.mlp3(cur3)
        cur4 = self.mlp4(cur4)
        x2 = x2 + self.drop_path(cur2)
        x3 = x3 + self.drop_path(cur3)
        x4 = x4 + self.drop_path(cur4)

        return x1, x2, x3, x4
mindocr.models.backbones.mindcv_models.coat.ParallelBlock.downsample(x, output_size, size)

Feature map down-sampling.

Source code in mindocr\models\backbones\mindcv_models\coat.py
338
339
340
def downsample(self, x, output_size, size) -> Tensor:
    """ Feature map down-sampling. """
    return self.interpolate(x, output_size=output_size, size=size)
mindocr.models.backbones.mindcv_models.coat.ParallelBlock.interpolate(x, output_size, size)

Feature map interpolation.

Source code in mindocr\models\backbones\mindcv_models\coat.py
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
def interpolate(self, x, output_size, size) -> Tensor:
    """ Feature map interpolation. """
    B, N, C = x.shape
    H, W = size

    cls_token = x[:, :1, :]
    img_tokens = x[:, 1:, :]

    img_tokens = ops.transpose(img_tokens, (0, 2, 1))
    img_tokens = ops.reshape(img_tokens, (B, C, H, W))
    img_tokens = ops.interpolate(img_tokens,
                                 sizes=output_size,
                                 mode='bilinear'
                                 )
    img_tokens = ops.reshape(img_tokens, (B, C, -1))
    img_tokens = ops.transpose(img_tokens, (0, 2, 1))

    out = ops.concat((cls_token, img_tokens), axis=1)
    return out
mindocr.models.backbones.mindcv_models.coat.ParallelBlock.upsample(x, output_size, size)

Feature map up-sampling.

Source code in mindocr\models\backbones\mindcv_models\coat.py
334
335
336
def upsample(self, x, output_size, size) -> Tensor:
    """ Feature map up-sampling. """
    return self.interpolate(x, output_size=output_size, size=size)
mindocr.models.backbones.mindcv_models.coat.PatchEmbed

Bases: nn.Cell

Image to Patch Embedding

Source code in mindocr\models\backbones\mindcv_models\coat.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
class PatchEmbed(nn.Cell):
    """ Image to Patch Embedding """

    def __init__(
        self,
        image_size=224,
        patch_size=4,
        in_chans=3,
        embed_dim=96
    ) -> None:
        super().__init__()
        image_size = (image_size, image_size)
        patch_size = (patch_size, patch_size)
        patches_resolution = [image_size[0] // patch_size[0], image_size[1] // patch_size[1]]

        self.image_size = image_size
        self.patch_size = patch_size
        self.patches_resolution = patches_resolution
        self.num_patches = patches_resolution[0] * patches_resolution[1]

        self.in_chans = in_chans
        self.embed_dim = embed_dim

        self.proj = nn.Conv2d(in_channels=in_chans,
                              out_channels=embed_dim,
                              kernel_size=patch_size,
                              stride=patch_size,
                              pad_mode='valid',
                              has_bias=True)

        self.norm = nn.LayerNorm((embed_dim,), epsilon=1e-5)

    def construct(self, x: Tensor) -> Tensor:
        B = x.shape[0]

        x = ops.reshape(self.proj(x), (B, self.embed_dim, -1))
        x = ops.transpose(x, (0, 2, 1))
        x = self.norm(x)

        return x
mindocr.models.backbones.mindcv_models.coat.SerialBlock

Bases: nn.Cell

Serial block class. Note: In this implementation, each serial block only contains a conv-attention and a FFN (MLP) module.

Source code in mindocr\models\backbones\mindcv_models\coat.py
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
class SerialBlock(nn.Cell):
    """
    Serial block class.
        Note: In this implementation, each serial block only contains a conv-attention and a FFN (MLP) module.
    """

    def __init__(
        self,
        dim,
        num_heads,
        mlp_ratio=4.,
        qkv_bias=False,
        drop=0.,
        attn_drop=0.,
        drop_path=0.,
        shared_cpe=None,
        shared_crpe=None
    ) -> None:
        super().__init__()

        self.cpe = shared_cpe

        self.norm1 = nn.LayerNorm((dim,), epsilon=1e-6)
        self.factoratt_crpe = FactorAtt_ConvRelPosEnc(dim,
                                                      num_heads=num_heads,
                                                      qkv_bias=qkv_bias,
                                                      attn_drop=attn_drop,
                                                      proj_drop=drop,
                                                      shared_crpe=shared_crpe
                                                      )
        self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity()

        self.norm2 = nn.LayerNorm((dim,), epsilon=1e-6)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, drop=drop)

    def construct(self, x, size) -> Tensor:
        x = x + self.drop_path(self.factoratt_crpe(self.norm1(self.cpe(x, size)), size))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
mindocr.models.backbones.mindcv_models.convit

MindSpore implementation of ConViT. Refer to ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

mindocr.models.backbones.mindcv_models.convit.Block

Bases: nn.Cell

Basic module of ConViT

Source code in mindocr\models\backbones\mindcv_models\convit.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
class Block(nn.Cell):
    """Basic module of ConViT"""

    def __init__(
        self,
        dim: int,
        num_heads: int,
        mlp_ratio: float,
        qkv_bias: bool = False,
        drop: float = 0.0,
        attn_drop: float = 0.0,
        drop_path: float = 0.0,
        use_gpsa: bool = True,
        **kwargs
    ) -> None:
        super().__init__()

        self.norm1 = nn.LayerNorm((dim,))
        if use_gpsa:
            self.attn = GPSA(dim, num_heads=num_heads, qkv_bias=qkv_bias,
                             attn_drop=attn_drop, proj_drop=drop, **kwargs)
        else:
            self.attn = MHSA(dim, num_heads=num_heads, qkv_bias=qkv_bias,
                             attn_drop=attn_drop, proj_drop=drop, **kwargs)
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
        self.norm2 = nn.LayerNorm((dim,))
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=nn.GELU, drop=drop)

    def construct(self, x: Tensor) -> Tensor:
        x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
mindocr.models.backbones.mindcv_models.convit.ConViT

Bases: nn.Cell

ConViT model class, based on '"Improving Vision Transformers with Soft Convolutional Inductive Biases" https://arxiv.org/pdf/2103.10697.pdf'

PARAMETER DESCRIPTION
in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

image_size

images input size. Default: 224.

TYPE: int) DEFAULT: 224

patch_size

image patch size. Default: 16.

TYPE: int) DEFAULT: 16

embed_dim

embedding dimension in all head. Default: 48.

TYPE: int) DEFAULT: 48

num_heads

number of heads. Default: 12.

TYPE: int) DEFAULT: 12

drop_rate

dropout rate. Default: 0.

TYPE: float) DEFAULT: 0.0

drop_path_rate

drop path rate. Default: 0.1.

TYPE: float) DEFAULT: 0.1

depth

model block depth. Default: 12.

TYPE: int) DEFAULT: 12

mlp_ratio

ratio of hidden features in Mlp. Default: 4.

TYPE: float) DEFAULT: 4.0

qkv_bias

have bias in qkv layers or not. Default: False.

TYPE: bool) DEFAULT: False

attn_drop_rate

attention layers dropout rate. Default: 0.

TYPE: float) DEFAULT: 0.0

locality_strength

determines how focused each head is around its attention center. Default: 1.

TYPE: float) DEFAULT: 1.0

local_up_to_layer

number of GPSA layers. Default: 10.

TYPE: int) DEFAULT: 10

use_pos_embed

whether use the embeded position. Default: True.

TYPE: bool DEFAULT: True

locality_strength(float)

the strength of locality. Default: 1.

Source code in mindocr\models\backbones\mindcv_models\convit.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
class ConViT(nn.Cell):
    r"""ConViT model class, based on
    '"Improving Vision Transformers with Soft Convolutional Inductive Biases"
    <https://arxiv.org/pdf/2103.10697.pdf>'

    Args:
        in_channels (int): number the channels of the input. Default: 3.
        num_classes (int) : number of classification classes. Default: 1000.
        image_size (int) : images input size. Default: 224.
        patch_size (int) : image patch size. Default: 16.
        embed_dim (int) : embedding dimension in all head. Default: 48.
        num_heads (int) : number of heads. Default: 12.
        drop_rate (float) : dropout rate. Default: 0.
        drop_path_rate (float) : drop path rate. Default: 0.1.
        depth (int) : model block depth. Default: 12.
        mlp_ratio (float) : ratio of hidden features in Mlp. Default: 4.
        qkv_bias (bool) : have bias in qkv layers or not. Default: False.
        attn_drop_rate (float) : attention layers dropout rate. Default: 0.
        locality_strength (float) : determines how focused each head is around its attention center. Default: 1.
        local_up_to_layer (int) : number of GPSA layers. Default: 10.
        use_pos_embed (bool): whether use the embeded position.  Default: True.
        locality_strength(float): the strength of locality. Default: 1.
    """

    def __init__(
        self,
        in_channels: int = 3,
        num_classes: int = 1000,
        image_size: int = 224,
        patch_size: int = 16,
        embed_dim: int = 48,
        num_heads: int = 12,
        drop_rate: float = 0.0,
        drop_path_rate: float = 0.1,
        depth: int = 12,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = False,
        attn_drop_rate: float = 0.0,
        local_up_to_layer: int = 10,
        use_pos_embed: bool = True,
        locality_strength: float = 1.0,
    ) -> None:
        super().__init__()

        self.local_up_to_layer = local_up_to_layer
        self.use_pos_embed = use_pos_embed
        self.num_heads = num_heads
        self.locality_strength = locality_strength
        self.embed_dim = embed_dim

        self.patch_embed = PatchEmbed(
            image_size=image_size, patch_size=patch_size, in_chans=in_channels, embed_dim=embed_dim)
        self.num_patches = self.patch_embed.num_patches

        self.cls_token = Parameter(ops.Zeros()((1, 1, embed_dim), ms.float32))
        self.pos_drop = nn.Dropout(keep_prob=1.0 - drop_rate)

        if self.use_pos_embed:
            self.pos_embed = Parameter(ops.Zeros()((1, self.num_patches, embed_dim), ms.float32))
            self.pos_embed.set_data(init.initializer(init.TruncatedNormal(sigma=0.02), self.pos_embed.data.shape))

        dpr = [x.item() for x in np.linspace(0, drop_path_rate, depth)]
        self.blocks = nn.CellList([
            Block(
                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                use_gpsa=True)
            if i < local_up_to_layer else
            Block(
                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                use_gpsa=False)
            for i in range(depth)])
        self.norm = nn.LayerNorm((embed_dim,))

        self.classifier = nn.Dense(in_channels=embed_dim, out_channels=num_classes) if num_classes > 0 else Identity()
        self.cls_token.set_data(init.initializer(init.TruncatedNormal(sigma=0.02), self.cls_token.data.shape))
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(init.initializer(init.TruncatedNormal(sigma=0.02), cell.weight.data.shape))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Constant(0), cell.bias.shape))
            elif isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(init.initializer(init.Constant(1), cell.gamma.shape))
                cell.beta.set_data(init.initializer(init.Constant(0), cell.beta.shape))
        # local init
        for i in range(self.local_up_to_layer):
            self.blocks[i].attn.v.weight.set_data(ops.eye(self.embed_dim, self.embed_dim, ms.float32), slice_shape=True)
            locality_distance = 1
            kernel_size = int(self.num_heads**0.5)
            center = (kernel_size - 1) / 2 if kernel_size % 2 == 0 else kernel_size // 2
            pos_weight_data = self.blocks[i].attn.pos_proj.weight.data
            for h1 in range(kernel_size):
                for h2 in range(kernel_size):
                    position = h1 + kernel_size * h2
                    pos_weight_data[position, 2] = -1
                    pos_weight_data[position, 1] = 2 * (h1 - center) * locality_distance
                    pos_weight_data[position, 0] = 2 * (h2 - center) * locality_distance
            pos_weight_data = pos_weight_data * self.locality_strength
            self.blocks[i].attn.pos_proj.weight.set_data(pos_weight_data)

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.patch_embed(x)
        if self.use_pos_embed:
            x = x + self.pos_embed
        x = self.pos_drop(x)
        cls_tokens = ops.tile(self.cls_token, (x.shape[0], 1, 1))
        for u, blk in enumerate(self.blocks):
            if u == self.local_up_to_layer:
                x = ops.Cast()(x, cls_tokens.dtype)
                x = ops.concat((cls_tokens, x), 1)
            x = blk(x)
        x = self.norm(x)
        return x[:, 0]

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.convit.convit_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConViT base model Refer to the base class "models.ConViT" for more details.

Source code in mindocr\models\backbones\mindcv_models\convit.py
397
398
399
400
401
402
403
404
405
406
407
408
409
@register_model
def convit_base(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> ConViT:
    """Get ConViT base model
    Refer to the base class "models.ConViT" for more details.
    """
    default_cfg = default_cfgs["convit_base"]
    model = ConViT(in_channels=in_channels, num_classes=num_classes,
                   num_heads=16, embed_dim=768, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convit.convit_base_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConViT base+ model Refer to the base class "models.ConViT" for more details.

Source code in mindocr\models\backbones\mindcv_models\convit.py
412
413
414
415
416
417
418
419
420
421
422
423
424
@register_model
def convit_base_plus(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> ConViT:
    """Get ConViT base+ model
    Refer to the base class "models.ConViT" for more details.
    """
    default_cfg = default_cfgs["convit_base_plus"]
    model = ConViT(in_channels=in_channels, num_classes=num_classes,
                   num_heads=16, embed_dim=1024, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convit.convit_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConViT small model Refer to the base class "models.ConViT" for more details.

Source code in mindocr\models\backbones\mindcv_models\convit.py
367
368
369
370
371
372
373
374
375
376
377
378
379
@register_model
def convit_small(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> ConViT:
    """Get ConViT small model
    Refer to the base class "models.ConViT" for more details.
    """
    default_cfg = default_cfgs["convit_small"]
    model = ConViT(in_channels=in_channels, num_classes=num_classes,
                   num_heads=9, embed_dim=432, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convit.convit_small_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConViT small+ model Refer to the base class "models.ConViT" for more details.

Source code in mindocr\models\backbones\mindcv_models\convit.py
382
383
384
385
386
387
388
389
390
391
392
393
394
@register_model
def convit_small_plus(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> ConViT:
    """Get ConViT small+ model
    Refer to the base class "models.ConViT" for more details.
    """
    default_cfg = default_cfgs["convit_small_plus"]
    model = ConViT(in_channels=in_channels, num_classes=num_classes,
                   num_heads=9, embed_dim=576, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convit.convit_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConViT tiny model Refer to the base class "models.ConViT" for more details.

Source code in mindocr\models\backbones\mindcv_models\convit.py
337
338
339
340
341
342
343
344
345
346
347
348
349
@register_model
def convit_tiny(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> ConViT:
    """Get ConViT tiny model
    Refer to the base class "models.ConViT" for more details.
    """
    default_cfg = default_cfgs["convit_tiny"]
    model = ConViT(in_channels=in_channels, num_classes=num_classes,
                   num_heads=4, embed_dim=192, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convit.convit_tiny_plus(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConViT tiny+ model Refer to the base class "models.ConViT" for more details.

Source code in mindocr\models\backbones\mindcv_models\convit.py
352
353
354
355
356
357
358
359
360
361
362
363
364
@register_model
def convit_tiny_plus(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> ConViT:
    """Get ConViT tiny+ model
    Refer to the base class "models.ConViT" for more details.
    """
    default_cfg = default_cfgs["convit_tiny_plus"]
    model = ConViT(in_channels=in_channels, num_classes=num_classes,
                   num_heads=4, embed_dim=256, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convnext

MindSpore implementation of ConvNeXt. Refer to: A ConvNet for the 2020s

mindocr.models.backbones.mindcv_models.convnext.Block

Bases: nn.Cell

ConvNeXt Block

There are two equivalent implementations

(1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back

Unlike the official impl, this one allows choice of 1 or 2, 1x1 conv can be faster with appropriate choice of LayerNorm impl, however as model size increases the tradeoffs appear to change and nn.Linear is a better choice. This was observed with PyTorch 1.10 on 3090 GPU, it could change over time & w/ different HW.

PARAMETER DESCRIPTION
dim

Number of input channels.

TYPE: int

drop_path

Stochastic depth rate. Default: 0.0

TYPE: float DEFAULT: 0.0

layer_scale_init_value

Init value for Layer Scale. Default: 1e-6.

TYPE: float DEFAULT: 1e-06

Source code in mindocr\models\backbones\mindcv_models\convnext.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
class Block(nn.Cell):
    """ConvNeXt Block
    There are two equivalent implementations:
      (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W)
      (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back
    Unlike the official impl, this one allows choice of 1 or 2, 1x1 conv can be faster with appropriate
    choice of LayerNorm impl, however as model size increases the tradeoffs appear to change and nn.Linear
    is a better choice. This was observed with PyTorch 1.10 on 3090 GPU, it could change over time & w/ different HW.
    Args:
        dim (int): Number of input channels.
        drop_path (float): Stochastic depth rate. Default: 0.0
        layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6.
    """

    def __init__(
        self,
        dim: int,
        drop_path: float = 0.0,
        layer_scale_init_value: float = 1e-6,
    ) -> None:
        super().__init__()
        self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, group=dim, has_bias=True)  # depthwise conv
        self.norm = ConvNextLayerNorm((dim,), epsilon=1e-6)
        self.pwconv1 = nn.Dense(dim, 4 * dim)  # pointwise/1x1 convs, implemented with Dense layers
        self.act = nn.GELU()
        self.pwconv2 = nn.Dense(4 * dim, dim)
        self.gamma_ = Parameter(Tensor(layer_scale_init_value * np.ones((dim)), dtype=mstype.float32),
                                requires_grad=True) if layer_scale_init_value > 0 else None
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()

    def construct(self, x: Tensor) -> Tensor:
        downsample = x
        x = self.dwconv(x)
        x = ops.transpose(x, (0, 2, 3, 1))
        x = self.norm(x)
        x = self.pwconv1(x)
        x = self.act(x)
        x = self.pwconv2(x)
        if self.gamma_ is not None:
            x = self.gamma_ * x
        x = ops.transpose(x, (0, 3, 1, 2))
        x = downsample + self.drop_path(x)
        return x
mindocr.models.backbones.mindcv_models.convnext.ConvNeXt

Bases: nn.Cell

ConvNeXt model class, based on '"A ConvNet for the 2020s" https://arxiv.org/abs/2201.03545'

PARAMETER DESCRIPTION
in_channels

dim of the input channel.

TYPE: int)

num_classes

dim of the classes predicted.

TYPE: int)

depths

the depths of each layer.

TYPE: List[int])

dims

the middle dim of each layer.

TYPE: List[int])

drop_path_rate

the rate of droppath default : 0.

TYPE: float) DEFAULT: 0.0

layer_scale_init_value

the parameter of init for the classifier default : 1e-6.

TYPE: float) DEFAULT: 1e-06

head_init_scale

the parameter of init for the head default : 1.

TYPE: float) DEFAULT: 1.0

Source code in mindocr\models\backbones\mindcv_models\convnext.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
class ConvNeXt(nn.Cell):
    r"""ConvNeXt model class, based on
    '"A ConvNet for the 2020s" <https://arxiv.org/abs/2201.03545>'
    Args:
        in_channels (int) : dim of the input channel.
        num_classes (int) : dim of the classes predicted.
        depths (List[int]) : the depths of each layer.
        dims (List[int]) : the middle dim of each layer.
        drop_path_rate (float) : the rate of droppath default : 0.
        layer_scale_init_value (float) : the parameter of init for the classifier default : 1e-6.
        head_init_scale (float) : the parameter of init for the head default : 1.
    """

    def __init__(
        self,
        in_channels: int,
        num_classes: int,
        depths: List[int],
        dims: List[int],
        drop_path_rate: float = 0.0,
        layer_scale_init_value: float = 1e-6,
        head_init_scale: float = 1.0,
    ):
        super().__init__()

        self.downsample_layers = nn.CellList()  # stem and 3 intermediate down_sampling conv layers
        stem = nn.SequentialCell(
            nn.Conv2d(in_channels, dims[0], kernel_size=4, stride=4, has_bias=True),
            ConvNextLayerNorm((dims[0],), epsilon=1e-6, norm_axis=1),
        )
        self.downsample_layers.append(stem)
        for i in range(3):
            downsample_layer = nn.SequentialCell(
                ConvNextLayerNorm((dims[i],), epsilon=1e-6, norm_axis=1),
                nn.Conv2d(dims[i], dims[i + 1], kernel_size=2, stride=2, has_bias=True),
            )
            self.downsample_layers.append(downsample_layer)

        self.stages = nn.CellList()  # 4 feature resolution stages, each consisting of multiple residual blocks
        dp_rates = list(np.linspace(0, drop_path_rate, sum(depths)))
        cur = 0
        for i in range(4):
            blocks = []
            for j in range(depths[i]):
                blocks.append(Block(dim=dims[i], drop_path=dp_rates[cur + j],
                                    layer_scale_init_value=layer_scale_init_value))
            stage = nn.SequentialCell(blocks)
            self.stages.append(stage)
            cur += depths[i]

        self.norm = ConvNextLayerNorm((dims[-1],), epsilon=1e-6)  # final norm layer
        self.classifier = nn.Dense(dims[-1], num_classes)  # classifier
        self.feature = nn.SequentialCell([
            self.downsample_layers[0],
            self.stages[0],
            self.downsample_layers[1],
            self.stages[1],
            self.downsample_layers[2],
            self.stages[2],
            self.downsample_layers[3],
            self.stages[3]
        ])
        self.head_init_scale = head_init_scale
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, (nn.Dense, nn.Conv2d)):
                cell.weight.set_data(
                    init.initializer(init.TruncatedNormal(sigma=0.02), cell.weight.shape, cell.weight.dtype)
                )
                if isinstance(cell, nn.Dense) and cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))
        self.classifier.weight.set_data(self.classifier.weight * self.head_init_scale)
        self.classifier.bias.set_data(self.classifier.bias * self.head_init_scale)

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.classifier(x)
        return x

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.feature(x)
        return self.norm(x.mean([-2, -1]))  # global average pooling, (N, C, H, W) -> (N, C)

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.convnext.ConvNextLayerNorm

Bases: nn.LayerNorm

LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).

Source code in mindocr\models\backbones\mindcv_models\convnext.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class ConvNextLayerNorm(nn.LayerNorm):
    r"""LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W)."""

    def __init__(
        self,
        normalized_shape: Tuple[int],
        epsilon: float,
        norm_axis: int = -1,
    ) -> None:
        super().__init__(normalized_shape=normalized_shape, epsilon=epsilon)
        assert norm_axis in (-1, 1), "ConvNextLayerNorm's norm_axis must be 1 or -1."
        self.norm_axis = norm_axis

    def construct(self, input_x: Tensor) -> Tensor:
        if self.norm_axis == -1:
            y, _, _ = self.layer_norm(input_x, self.gamma, self.beta)
        else:
            input_x = ops.transpose(input_x, (0, 2, 3, 1))
            y, _, _ = self.layer_norm(input_x, self.gamma, self.beta)
            y = ops.transpose(y, (0, 3, 1, 2))
        return y
mindocr.models.backbones.mindcv_models.convnext.convnext_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConvNeXt base model. Refer to the base class 'models.ConvNeXt' for more details.

Source code in mindocr\models\backbones\mindcv_models\convnext.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
@register_model
def convnext_base(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ConvNeXt:
    """Get ConvNeXt base model.
    Refer to the base class 'models.ConvNeXt' for more details.
    """
    default_cfg = default_cfgs["convnext_base"]
    model = ConvNeXt(
        in_channels=in_channels, num_classes=num_classes, depths=[3, 3, 27, 3], dims=[128, 256, 512, 1024], **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convnext.convnext_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConvNeXt large model. Refer to the base class 'models.ConvNeXt' for more details.

Source code in mindocr\models\backbones\mindcv_models\convnext.py
255
256
257
258
259
260
261
262
263
264
265
266
267
268
@register_model
def convnext_large(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ConvNeXt:
    """Get ConvNeXt large model.
    Refer to the base class 'models.ConvNeXt' for more details.
    """
    default_cfg = default_cfgs["convnext_large"]
    model = ConvNeXt(
        in_channels=in_channels, num_classes=num_classes, depths=[3, 3, 27, 3], dims=[192, 384, 768, 1536], **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convnext.convnext_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConvNeXt small model. Refer to the base class 'models.ConvNeXt' for more details.

Source code in mindocr\models\backbones\mindcv_models\convnext.py
223
224
225
226
227
228
229
230
231
232
233
234
235
236
@register_model
def convnext_small(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ConvNeXt:
    """Get ConvNeXt small model.
    Refer to the base class 'models.ConvNeXt' for more details.
    """
    default_cfg = default_cfgs["convnext_small"]
    model = ConvNeXt(
        in_channels=in_channels, num_classes=num_classes, depths=[3, 3, 27, 3], dims=[96, 192, 384, 768], **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convnext.convnext_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConvNeXt tiny model. Refer to the base class 'models.ConvNeXt' for more details.

Source code in mindocr\models\backbones\mindcv_models\convnext.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
@register_model
def convnext_tiny(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ConvNeXt:
    """Get ConvNeXt tiny model.
    Refer to the base class 'models.ConvNeXt' for more details.
    """
    default_cfg = default_cfgs["convnext_tiny"]
    model = ConvNeXt(
        in_channels=in_channels, num_classes=num_classes, depths=[3, 3, 9, 3], dims=[96, 192, 384, 768], **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.convnext.convnext_xlarge(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ConvNeXt xlarge model. Refer to the base class 'models.ConvNeXt' for more details.

Source code in mindocr\models\backbones\mindcv_models\convnext.py
271
272
273
274
275
276
277
278
279
280
281
282
283
284
@register_model
def convnext_xlarge(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ConvNeXt:
    """Get ConvNeXt xlarge model.
    Refer to the base class 'models.ConvNeXt' for more details.
    """
    default_cfg = default_cfgs["convnext_xlarge"]
    model = ConvNeXt(
        in_channels=in_channels, num_classes=num_classes, depths=[3, 3, 27, 3], dims=[256, 512, 1024, 2048], **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.crossvit

MindSpore implementation of crossvit. Refer to crossvit: Cross-Attention Multi-Scale Vision Transformer for Image Classification

mindocr.models.backbones.mindcv_models.crossvit.PatchEmbed

Bases: nn.Cell

Image to Patch Embedding

Source code in mindocr\models\backbones\mindcv_models\crossvit.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
class PatchEmbed(nn.Cell):
    """ Image to Patch Embedding
    """

    def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, multi_conv=True):
        super().__init__()
        img_size = to_2tuple(img_size)
        patch_size = to_2tuple(patch_size)
        num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // patch_size[0])
        self.img_size = img_size
        self.patch_size = patch_size
        self.num_patches = num_patches
        if multi_conv:
            if patch_size[0] == 12:
                self.proj = nn.SequentialCell(
                    nn.Conv2d(in_chans, embed_dim // 4, pad_mode='pad', kernel_size=7, stride=4, padding=3),
                    nn.ReLU(),
                    nn.Conv2d(embed_dim // 4, embed_dim // 2, pad_mode='pad', kernel_size=3, stride=3, padding=0),
                    nn.ReLU(),
                    nn.Conv2d(embed_dim // 2, embed_dim, pad_mode='pad', kernel_size=3, stride=1, padding=1),
                )
            elif patch_size[0] == 16:
                self.proj = nn.SequentialCell(
                    nn.Conv2d(in_chans, embed_dim // 4, pad_mode='pad', kernel_size=7, stride=4, padding=3),
                    nn.ReLU(),
                    nn.Conv2d(embed_dim // 4, embed_dim // 2, pad_mode='pad', kernel_size=3, stride=2, padding=1),
                    nn.ReLU(),
                    nn.Conv2d(embed_dim // 2, embed_dim, pad_mode='pad', kernel_size=3, stride=2, padding=1),
                )
        else:
            self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, pad_mode='valid',
                                  has_bias=True)

    def construct(self, x: Tensor) -> Tensor:
        B, C, H, W = x.shape
        # FIXME look at relaxing size constraints

        # assert H == self.img_size[0] and W == self.img_size[1], \
        # f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
        x = self.proj(x)
        B, C, H, W = x.shape
        x = x.reshape(B, C, H * W)
        x = ops.transpose(x, (0, 2, 1))
        return x
mindocr.models.backbones.mindcv_models.crossvit.VisionTransformer

Bases: nn.Cell

Vision Transformer with support for patch or hybrid CNN input stage

Source code in mindocr\models\backbones\mindcv_models\crossvit.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
class VisionTransformer(nn.Cell):
    """ Vision Transformer with support for patch or hybrid CNN input stage
    """

    def __init__(self, img_size=(224, 224), patch_size=(8, 16), in_channels=3, num_classes=1000, embed_dim=(192, 384),
                 depth=([1, 3, 1], [1, 3, 1], [1, 3, 1]),
                 num_heads=(6, 12), mlp_ratio=(2., 2., 4.), qkv_bias=False, qk_scale=None, drop_rate=0.,
                 attn_drop_rate=0.,
                 drop_path_rate=0., hybrid_backbone=None, norm_layer=nn.LayerNorm, multi_conv=False):
        super().__init__()

        self.num_classes = num_classes
        if not isinstance(img_size, list):
            img_size = to_2tuple(img_size)
        self.img_size = img_size

        num_patches = _compute_num_patches(img_size, patch_size)
        self.num_branches = len(patch_size)

        patch_embed = []
        if hybrid_backbone is None:
            b = []
            for i in range(self.num_branches):
                c = ms.Parameter(Tensor(np.zeros([1, 1 + num_patches[i], embed_dim[i]], np.float32)),
                                 name='pos_embed.' + str(i))
                b.append(c)
            b = tuple(b)
            self.pos_embed = ms.ParameterTuple(b)
            for im_s, p, d in zip(img_size, patch_size, embed_dim):
                patch_embed.append(
                    PatchEmbed(img_size=im_s, patch_size=p, in_chans=in_channels, embed_dim=d, multi_conv=multi_conv))
            self.patch_embed = nn.CellList(patch_embed)

        d = []
        for i in range(self.num_branches):
            c = ms.Parameter(Tensor(np.zeros([1, 1, embed_dim[i]], np.float32)), name='cls_token.' + str(i))
            d.append(c)
        d = tuple(d)
        self.cls_token = ms.ParameterTuple(d)
        self.pos_drop = nn.Dropout(1.0 - drop_rate)

        total_depth = sum([sum(x[-2:]) for x in depth])
        dpr = np.linspace(0, drop_path_rate, total_depth)  # stochastic depth decay rule
        dpr_ptr = 0
        self.blocks = nn.CellList()
        for idx, block_cfg in enumerate(depth):
            curr_depth = max(block_cfg[:-1]) + block_cfg[-1]
            dpr_ = dpr[dpr_ptr:dpr_ptr + curr_depth]
            blk = MultiScaleBlock(embed_dim, num_patches, block_cfg, num_heads=num_heads, mlp_ratio=mlp_ratio,
                                  qkv_bias=qkv_bias, qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate,
                                  drop_path=dpr_,
                                  norm_layer=norm_layer)
            dpr_ptr += curr_depth
            self.blocks.append(blk)

        self.norm = nn.CellList([norm_layer((embed_dim[i],), epsilon=1e-6) for i in range(self.num_branches)])
        self.head = nn.CellList([nn.Dense(embed_dim[i], num_classes) if num_classes > 0 else Identity() for i in
                                 range(self.num_branches)])

        for i in range(self.num_branches):
            if self.pos_embed[i].requires_grad:
                tensor1 = init.initializer(TruncatedNormal(sigma=.02), self.pos_embed[i].data.shape, ms.float32)
                self.pos_embed[i].set_data(tensor1)
            tensor2 = init.initializer(TruncatedNormal(sigma=.02), self.cls_token[i].data.shape, ms.float32)
            self.cls_token[i].set_data(tensor2)

        self._initialize_weights()

    def _initialize_weights(self) -> None:
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(init.initializer(init.TruncatedNormal(sigma=.02), cell.weight.data.shape))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Constant(0), cell.bias.shape))
            elif isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(init.initializer(init.Constant(1), cell.gamma.shape))
                cell.beta.set_data(init.initializer(init.Constant(0), cell.beta.shape))

    def no_weight_decay(self):
        out = {'cls_token'}
        if self.pos_embed[0].requires_grad:
            out.add('pos_embed')
        return out

    def get_classifier(self):
        return self.head

    def reset_classifier(self, num_classes, global_pool=''):
        self.num_classes = num_classes
        self.head = nn.Dense(self.embed_dim, num_classes) if num_classes > 0 else Identity()

    def forward_features(self, x: Tensor) -> Tensor:
        B, C, H, W = x.shape
        xs = []
        # print(x)
        for i in range(self.num_branches):
            x_ = ops.interpolate(x, sizes=(self.img_size[i], self.img_size[i]), mode='bilinear') if H != self.img_size[
                i] else x
            tmp = self.patch_embed[i](x_)
            z = self.cls_token[i].shape
            y = Tensor(np.ones((B, z[1], z[2])), dtype=mstype.float32)
            cls_tokens = self.cls_token[i]
            cls_tokens = cls_tokens.expand_as(y)  # stole cls_tokens impl from Phil Wang, thanks
            con = ops.Concat(1)
            cls_tokens = cls_tokens.astype("float32")
            tmp = tmp.astype("float32")
            tmp = con((cls_tokens, tmp))
            tmp = tmp + self.pos_embed[i]
            tmp = self.pos_drop(tmp)
            xs.append(tmp)

        for blk in self.blocks:
            xs = blk(xs)

        # NOTE: was before branch token section, move to here to assure all branch token are before layer norm
        k = 0
        xs2 = []
        for x in xs:
            xs2.append(self.norm[k](x))
            k = k + 1
        xs = xs2
        out = []
        for x in xs:
            out.append(x[:, 0])
        return out

    def forward_head(self, x: Tensor) -> Tensor:
        ce_logits = []
        zz = 0
        for c in x:
            ce_logits.append(self.head[zz](c))
            zz = zz + 1
        z = ops.stack([ce_logits[0], ce_logits[1]])
        op = ops.ReduceMean(keep_dims=False)
        ce_logits = op(z, 0)
        return ce_logits

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.densenet

MindSpore implementation of DenseNet. Refer to: Densely Connected Convolutional Networks

mindocr.models.backbones.mindcv_models.densenet.DenseNet

Bases: nn.Cell

Densenet-BC model class, based on "Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>_

PARAMETER DESCRIPTION
growth_rate

how many filters to add each layer (k in paper). Default: 32.

TYPE: int DEFAULT: 32

block_config

how many layers in each pooling block. Default: (6, 12, 24, 16).

TYPE: Tuple[int, int, int, int] DEFAULT: (6, 12, 24, 16)

num_init_features

number of filters in the first Conv2d. Default: 64.

TYPE: int DEFAULT: 64

bn_size

multiplicative factor for number of bottleneck layers (i.e. bn_size * k features in the bottleneck layer). Default: 4.

TYPE: int DEFAULT: 4

drop_rate

dropout rate after each dense layer. Default: 0.

TYPE: float DEFAULT: 0.0

in_channels

number of input channels. Default: 3.

TYPE: int DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\densenet.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
class DenseNet(nn.Cell):
    r"""Densenet-BC model class, based on
    `"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_

    Args:
        growth_rate: how many filters to add each layer (`k` in paper). Default: 32.
        block_config: how many layers in each pooling block. Default: (6, 12, 24, 16).
        num_init_features: number of filters in the first Conv2d. Default: 64.
        bn_size (int): multiplicative factor for number of bottleneck layers
          (i.e. bn_size * k features in the bottleneck layer). Default: 4.
        drop_rate: dropout rate after each dense layer. Default: 0.
        in_channels: number of input channels. Default: 3.
        num_classes: number of classification classes. Default: 1000.
    """

    def __init__(
        self,
        growth_rate: int = 32,
        block_config: Tuple[int, int, int, int] = (6, 12, 24, 16),
        num_init_features: int = 64,
        bn_size: int = 4,
        drop_rate: float = 0.0,
        in_channels: int = 3,
        num_classes: int = 1000,
    ) -> None:
        super().__init__()
        layers = OrderedDict()
        # first Conv2d
        num_features = num_init_features
        layers["conv0"] = nn.Conv2d(in_channels, num_features, kernel_size=7, stride=2, pad_mode="pad", padding=3)
        layers["norm0"] = nn.BatchNorm2d(num_features)
        layers["relu0"] = nn.ReLU()
        layers["pool0"] = nn.SequentialCell([
            nn.Pad(paddings=((0, 0), (0, 0), (1, 1), (1, 1)), mode="CONSTANT"),
            nn.MaxPool2d(kernel_size=3, stride=2),
        ])

        # DenseBlock
        for i, num_layers in enumerate(block_config):
            block = _DenseBlock(
                num_layers=num_layers,
                num_input_features=num_features,
                bn_size=bn_size,
                growth_rate=growth_rate,
                drop_rate=drop_rate,
            )
            layers[f"denseblock{i + 1}"] = block
            num_features += num_layers * growth_rate
            if i != len(block_config) - 1:
                transition = _Transition(num_features, num_features // 2)
                layers[f"transition{i + 1}"] = transition
                num_features = num_features // 2

        # final bn+ReLU
        layers["norm5"] = nn.BatchNorm2d(num_features)
        layers["relu5"] = nn.ReLU()

        self.num_features = num_features
        self.features = nn.SequentialCell(layers)
        self.pool = GlobalAvgPooling()
        self.classifier = nn.Dense(self.num_features, num_classes)
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(math.sqrt(5), mode="fan_out", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                         cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.densenet.densenet121(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 121 layers DenseNet model. Refer to the base class models.DenseNet for more details.

Source code in mindocr\models\backbones\mindcv_models\densenet.py
224
225
226
227
228
229
230
231
232
233
234
235
@register_model
def densenet121(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DenseNet:
    """Get 121 layers DenseNet model.
     Refer to the base class `models.DenseNet` for more details."""
    default_cfg = default_cfgs["densenet121"]
    model = DenseNet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, in_channels=in_channels,
                     num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.densenet.densenet161(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 161 layers DenseNet model. Refer to the base class models.DenseNet for more details.

Source code in mindocr\models\backbones\mindcv_models\densenet.py
238
239
240
241
242
243
244
245
246
247
248
249
@register_model
def densenet161(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DenseNet:
    """Get 161 layers DenseNet model.
     Refer to the base class `models.DenseNet` for more details."""
    default_cfg = default_cfgs["densenet161"]
    model = DenseNet(growth_rate=48, block_config=(6, 12, 36, 24), num_init_features=96, in_channels=in_channels,
                     num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.densenet.densenet169(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 169 layers DenseNet model. Refer to the base class models.DenseNet for more details.

Source code in mindocr\models\backbones\mindcv_models\densenet.py
252
253
254
255
256
257
258
259
260
261
262
263
@register_model
def densenet169(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DenseNet:
    """Get 169 layers DenseNet model.
     Refer to the base class `models.DenseNet` for more details."""
    default_cfg = default_cfgs["densenet169"]
    model = DenseNet(growth_rate=32, block_config=(6, 12, 32, 32), num_init_features=64, in_channels=in_channels,
                     num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.densenet.densenet201(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 201 layers DenseNet model. Refer to the base class models.DenseNet for more details.

Source code in mindocr\models\backbones\mindcv_models\densenet.py
266
267
268
269
270
271
272
273
274
275
276
277
@register_model
def densenet201(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DenseNet:
    """Get 201 layers DenseNet model.
     Refer to the base class `models.DenseNet` for more details."""
    default_cfg = default_cfgs["densenet201"]
    model = DenseNet(growth_rate=32, block_config=(6, 12, 48, 32), num_init_features=64, in_channels=in_channels,
                     num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.download

Utility of downloading

mindocr.models.backbones.mindcv_models.download.DownLoad

Base utility class for downloading.

Source code in mindocr\models\backbones\mindcv_models\download.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
class DownLoad:
    """Base utility class for downloading."""

    USER_AGENT: str = (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/92.0.4515.131 Safari/537.36"
    )

    @staticmethod
    def calculate_md5(file_path: str, chunk_size: int = 1024 * 1024) -> str:
        """Calculate md5 value."""
        md5 = hashlib.md5()
        with open(file_path, "rb") as fp:
            for chunk in iter(lambda: fp.read(chunk_size), b""):
                md5.update(chunk)
        return md5.hexdigest()

    def check_md5(self, file_path: str, md5: Optional[str] = None) -> bool:
        """Check md5 value."""
        return md5 == self.calculate_md5(file_path)

    @staticmethod
    def extract_tar(from_path: str, to_path: Optional[str] = None, compression: Optional[str] = None) -> None:
        """Extract tar format file."""

        with tarfile.open(from_path, f"r:{compression[1:]}" if compression else "r") as tar:
            tar.extractall(to_path)

    @staticmethod
    def extract_zip(from_path: str, to_path: Optional[str] = None, compression: Optional[str] = None) -> None:
        """Extract zip format file."""

        compression_mode = zipfile.ZIP_BZIP2 if compression else zipfile.ZIP_STORED
        with zipfile.ZipFile(from_path, "r", compression=compression_mode) as zip_file:
            zip_file.extractall(to_path)

    def extract_archive(self, from_path: str, to_path: str = None) -> str:
        """Extract and  archive from path to path."""
        archive_extractors = {
            ".tar": self.extract_tar,
            ".zip": self.extract_zip,
        }
        compress_file_open = {
            ".bz2": bz2.open,
            ".gz": gzip.open,
        }

        if not to_path:
            to_path = os.path.dirname(from_path)

        suffix, archive_type, compression = detect_file_type(from_path)  # pylint: disable=unused-variable

        if not archive_type:
            to_path = from_path.replace(suffix, "")
            compress = compress_file_open[compression]
            with compress(from_path, "rb") as rf, open(to_path, "wb") as wf:
                wf.write(rf.read())
            return to_path

        extractor = archive_extractors[archive_type]
        extractor(from_path, to_path, compression)

        return to_path

    def download_file(self, url: str, file_path: str, chunk_size: int = 1024):
        """Download a file."""

        # no check certificate
        ctx = ssl.create_default_context()
        ctx.check_hostname = False
        ctx.verify_mode = ssl.CERT_NONE

        # Define request headers.
        headers = {"User-Agent": self.USER_AGENT}

        _logger.info(f"Downloading from {url} to {file_path} ...")
        with open(file_path, "wb") as f:
            request = urllib.request.Request(url, headers=headers)
            with urllib.request.urlopen(request, context=ctx) as response:
                with tqdm(total=response.length, unit="B") as pbar:
                    for chunk in iter(lambda: response.read(chunk_size), b""):
                        if not chunk:
                            break
                        pbar.update(chunk_size)
                        f.write(chunk)

    def download_url(
        self,
        url: str,
        path: Optional[str] = None,
        filename: Optional[str] = None,
        md5: Optional[str] = None,
    ) -> None:
        """Download a file from a url and place it in root."""
        if path is None:
            path = get_default_download_root()
        path = os.path.expanduser(path)
        os.makedirs(path, exist_ok=True)

        if not filename:
            filename = os.path.basename(url)

        file_path = os.path.join(path, filename)

        # Check if the file is exists.
        if os.path.isfile(file_path):
            if not md5 or self.check_md5(file_path, md5):
                return file_path

        # Download the file.
        try:
            self.download_file(url, file_path)
        except (urllib.error.URLError, IOError) as e:
            if url.startswith("https"):
                url = url.replace("https", "http")
                try:
                    self.download_file(url, file_path)
                except (urllib.error.URLError, IOError):
                    # pylint: disable=protected-access
                    ssl._create_default_https_context = ssl._create_unverified_context
                    self.download_file(url, file_path)
                    ssl._create_default_https_context = ssl.create_default_context
            else:
                raise e

        return file_path

    def download_and_extract_archive(
        self,
        url: str,
        download_path: Optional[str] = None,
        extract_path: Optional[str] = None,
        filename: Optional[str] = None,
        md5: Optional[str] = None,
        remove_finished: bool = False,
    ) -> None:
        """Download and extract archive."""
        if download_path is None:
            download_path = get_default_download_root()
        download_path = os.path.expanduser(download_path)

        if not filename:
            filename = os.path.basename(url)

        self.download_url(url, download_path, filename, md5)

        archive = os.path.join(download_path, filename)
        self.extract_archive(archive, extract_path)

        if remove_finished:
            os.remove(archive)
mindocr.models.backbones.mindcv_models.download.DownLoad.calculate_md5(file_path, chunk_size=1024 * 1024) staticmethod

Calculate md5 value.

Source code in mindocr\models\backbones\mindcv_models\download.py
43
44
45
46
47
48
49
50
@staticmethod
def calculate_md5(file_path: str, chunk_size: int = 1024 * 1024) -> str:
    """Calculate md5 value."""
    md5 = hashlib.md5()
    with open(file_path, "rb") as fp:
        for chunk in iter(lambda: fp.read(chunk_size), b""):
            md5.update(chunk)
    return md5.hexdigest()
mindocr.models.backbones.mindcv_models.download.DownLoad.check_md5(file_path, md5=None)

Check md5 value.

Source code in mindocr\models\backbones\mindcv_models\download.py
52
53
54
def check_md5(self, file_path: str, md5: Optional[str] = None) -> bool:
    """Check md5 value."""
    return md5 == self.calculate_md5(file_path)
mindocr.models.backbones.mindcv_models.download.DownLoad.download_and_extract_archive(url, download_path=None, extract_path=None, filename=None, md5=None, remove_finished=False)

Download and extract archive.

Source code in mindocr\models\backbones\mindcv_models\download.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def download_and_extract_archive(
    self,
    url: str,
    download_path: Optional[str] = None,
    extract_path: Optional[str] = None,
    filename: Optional[str] = None,
    md5: Optional[str] = None,
    remove_finished: bool = False,
) -> None:
    """Download and extract archive."""
    if download_path is None:
        download_path = get_default_download_root()
    download_path = os.path.expanduser(download_path)

    if not filename:
        filename = os.path.basename(url)

    self.download_url(url, download_path, filename, md5)

    archive = os.path.join(download_path, filename)
    self.extract_archive(archive, extract_path)

    if remove_finished:
        os.remove(archive)
mindocr.models.backbones.mindcv_models.download.DownLoad.download_file(url, file_path, chunk_size=1024)

Download a file.

Source code in mindocr\models\backbones\mindcv_models\download.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def download_file(self, url: str, file_path: str, chunk_size: int = 1024):
    """Download a file."""

    # no check certificate
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE

    # Define request headers.
    headers = {"User-Agent": self.USER_AGENT}

    _logger.info(f"Downloading from {url} to {file_path} ...")
    with open(file_path, "wb") as f:
        request = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(request, context=ctx) as response:
            with tqdm(total=response.length, unit="B") as pbar:
                for chunk in iter(lambda: response.read(chunk_size), b""):
                    if not chunk:
                        break
                    pbar.update(chunk_size)
                    f.write(chunk)
mindocr.models.backbones.mindcv_models.download.DownLoad.download_url(url, path=None, filename=None, md5=None)

Download a file from a url and place it in root.

Source code in mindocr\models\backbones\mindcv_models\download.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def download_url(
    self,
    url: str,
    path: Optional[str] = None,
    filename: Optional[str] = None,
    md5: Optional[str] = None,
) -> None:
    """Download a file from a url and place it in root."""
    if path is None:
        path = get_default_download_root()
    path = os.path.expanduser(path)
    os.makedirs(path, exist_ok=True)

    if not filename:
        filename = os.path.basename(url)

    file_path = os.path.join(path, filename)

    # Check if the file is exists.
    if os.path.isfile(file_path):
        if not md5 or self.check_md5(file_path, md5):
            return file_path

    # Download the file.
    try:
        self.download_file(url, file_path)
    except (urllib.error.URLError, IOError) as e:
        if url.startswith("https"):
            url = url.replace("https", "http")
            try:
                self.download_file(url, file_path)
            except (urllib.error.URLError, IOError):
                # pylint: disable=protected-access
                ssl._create_default_https_context = ssl._create_unverified_context
                self.download_file(url, file_path)
                ssl._create_default_https_context = ssl.create_default_context
        else:
            raise e

    return file_path
mindocr.models.backbones.mindcv_models.download.DownLoad.extract_archive(from_path, to_path=None)

Extract and archive from path to path.

Source code in mindocr\models\backbones\mindcv_models\download.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def extract_archive(self, from_path: str, to_path: str = None) -> str:
    """Extract and  archive from path to path."""
    archive_extractors = {
        ".tar": self.extract_tar,
        ".zip": self.extract_zip,
    }
    compress_file_open = {
        ".bz2": bz2.open,
        ".gz": gzip.open,
    }

    if not to_path:
        to_path = os.path.dirname(from_path)

    suffix, archive_type, compression = detect_file_type(from_path)  # pylint: disable=unused-variable

    if not archive_type:
        to_path = from_path.replace(suffix, "")
        compress = compress_file_open[compression]
        with compress(from_path, "rb") as rf, open(to_path, "wb") as wf:
            wf.write(rf.read())
        return to_path

    extractor = archive_extractors[archive_type]
    extractor(from_path, to_path, compression)

    return to_path
mindocr.models.backbones.mindcv_models.download.DownLoad.extract_tar(from_path, to_path=None, compression=None) staticmethod

Extract tar format file.

Source code in mindocr\models\backbones\mindcv_models\download.py
56
57
58
59
60
61
@staticmethod
def extract_tar(from_path: str, to_path: Optional[str] = None, compression: Optional[str] = None) -> None:
    """Extract tar format file."""

    with tarfile.open(from_path, f"r:{compression[1:]}" if compression else "r") as tar:
        tar.extractall(to_path)
mindocr.models.backbones.mindcv_models.download.DownLoad.extract_zip(from_path, to_path=None, compression=None) staticmethod

Extract zip format file.

Source code in mindocr\models\backbones\mindcv_models\download.py
63
64
65
66
67
68
69
@staticmethod
def extract_zip(from_path: str, to_path: Optional[str] = None, compression: Optional[str] = None) -> None:
    """Extract zip format file."""

    compression_mode = zipfile.ZIP_BZIP2 if compression else zipfile.ZIP_STORED
    with zipfile.ZipFile(from_path, "r", compression=compression_mode) as zip_file:
        zip_file.extractall(to_path)
mindocr.models.backbones.mindcv_models.dpn

MindSpore implementation of DPN. Refer to: Dual Path Networks

mindocr.models.backbones.mindcv_models.dpn.BottleBlock

Bases: nn.Cell

A block for the Dual Path Architecture

Source code in mindocr\models\backbones\mindcv_models\dpn.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class BottleBlock(nn.Cell):
    """A block for the Dual Path Architecture"""

    def __init__(
        self,
        in_channel: int,
        num_1x1_a: int,
        num_3x3_b: int,
        num_1x1_c: int,
        inc: int,
        g: int,
        key_stride: int,
    ):
        super().__init__()
        self.bn1 = nn.BatchNorm2d(in_channel, eps=1e-3, momentum=0.9)
        self.conv1 = nn.Conv2d(in_channel, num_1x1_a, 1, stride=1)
        self.bn2 = nn.BatchNorm2d(num_1x1_a, eps=1e-3, momentum=0.9)
        self.conv2 = nn.Conv2d(num_1x1_a, num_3x3_b, 3, key_stride, pad_mode="pad", padding=1, group=g)
        self.bn3 = nn.BatchNorm2d(num_3x3_b, eps=1e-3, momentum=0.9)
        self.conv3_r = nn.Conv2d(num_3x3_b, num_1x1_c, 1, stride=1)
        self.conv3_d = nn.Conv2d(num_3x3_b, inc, 1, stride=1)

        self.relu = nn.ReLU()

    def construct(self, x: Tensor):
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv1(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn3(x)
        x = self.relu(x)
        return (self.conv3_r(x), self.conv3_d(x))
mindocr.models.backbones.mindcv_models.dpn.DPN

Bases: nn.Cell

DPN model class, based on "Dual Path Networks" <https://arxiv.org/pdf/1707.01629.pdf>_

PARAMETER DESCRIPTION
num_init_channel

int type, the output channel of first blocks. Default: 64.

TYPE: int DEFAULT: 64

k_r

int type, the first channel of each stage. Default: 96.

TYPE: int DEFAULT: 96

g

int type,number of group in the conv2d. Default: 32.

TYPE: int DEFAULT: 32

k_sec

multiplicative factor for number of bottleneck layers. Default: 4.

TYPE: Tuple[int] DEFAULT: (3, 4, 20, 3)

inc_sec

the first output channel in each stage. Default: (16, 32, 24, 128).

TYPE: Tuple[int] DEFAULT: (16, 32, 24, 128)

in_channels

int type, number of input channels. Default: 3.

TYPE: int DEFAULT: 3

num_classes

int type, number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\dpn.py
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
class DPN(nn.Cell):
    r"""DPN model class, based on
    `"Dual Path Networks" <https://arxiv.org/pdf/1707.01629.pdf>`_

    Args:
        num_init_channel: int type, the output channel of first blocks. Default: 64.
        k_r: int type, the first channel of each stage. Default: 96.
        g: int type,number of group in the conv2d. Default: 32.
        k_sec Tuple[int]: multiplicative factor for number of bottleneck layers. Default: 4.
        inc_sec Tuple[int]: the first output channel in each stage. Default: (16, 32, 24, 128).
        in_channels: int type, number of input channels. Default: 3.
        num_classes: int type, number of classification classes. Default: 1000.
    """

    def __init__(
        self,
        num_init_channel: int = 64,
        k_r: int = 96,
        g: int = 32,
        k_sec: Tuple[int, int, int, int] = (3, 4, 20, 3),
        inc_sec: Tuple[int, int, int, int] = (16, 32, 24, 128),
        in_channels: int = 3,
        num_classes: int = 1000,
    ):
        super().__init__()
        blocks = OrderedDict()

        # conv1
        blocks["conv1"] = nn.SequentialCell(OrderedDict([
            ("conv", nn.Conv2d(in_channels, num_init_channel, kernel_size=7, stride=2, pad_mode="pad", padding=3)),
            ("norm", nn.BatchNorm2d(num_init_channel, eps=1e-3, momentum=0.9)),
            ("relu", nn.ReLU()),
            ("maxpool", nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")),
        ]))

        # conv2
        bw = 256
        inc = inc_sec[0]
        r = int((k_r * bw) / 256)
        blocks["conv2_1"] = DualPathBlock(num_init_channel, r, r, bw, inc, g, "proj", False)
        in_channel = bw + 3 * inc
        for i in range(2, k_sec[0] + 1):
            blocks[f"conv2_{i}"] = DualPathBlock(in_channel, r, r, bw, inc, g, "normal")
            in_channel += inc

        # conv3
        bw = 512
        inc = inc_sec[1]
        r = int((k_r * bw) / 256)
        blocks["conv3_1"] = DualPathBlock(in_channel, r, r, bw, inc, g, "down")
        in_channel = bw + 3 * inc
        for i in range(2, k_sec[1] + 1):
            blocks[f"conv3_{i}"] = DualPathBlock(in_channel, r, r, bw, inc, g, "normal")
            in_channel += inc

        # conv4
        bw = 1024
        inc = inc_sec[2]
        r = int((k_r * bw) / 256)
        blocks["conv4_1"] = DualPathBlock(in_channel, r, r, bw, inc, g, "down")
        in_channel = bw + 3 * inc
        for i in range(2, k_sec[2] + 1):
            blocks[f"conv4_{i}"] = DualPathBlock(in_channel, r, r, bw, inc, g, "normal")
            in_channel += inc

        # conv5
        bw = 2048
        inc = inc_sec[3]
        r = int((k_r * bw) / 256)
        blocks["conv5_1"] = DualPathBlock(in_channel, r, r, bw, inc, g, "down")
        in_channel = bw + 3 * inc
        for i in range(2, k_sec[3] + 1):
            blocks[f"conv5_{i}"] = DualPathBlock(in_channel, r, r, bw, inc, g, "normal")
            in_channel += inc

        self.features = nn.SequentialCell(blocks)
        self.conv5_x = nn.SequentialCell(OrderedDict([
            ("norm", nn.BatchNorm2d(in_channel, eps=1e-3, momentum=0.9)),
            ("relu", nn.ReLU()),
        ]))
        self.avgpool = GlobalAvgPooling()
        self.classifier = nn.Dense(in_channel, num_classes)
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(math.sqrt(5), mode="fan_out", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                         cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_feature(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = ops.concat(x, axis=1)
        x = self.conv5_x(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.avgpool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_feature(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.dpn.DualPathBlock

Bases: nn.Cell

A block for Dual Path Networks to combine proj, residual and densely network

Source code in mindocr\models\backbones\mindcv_models\dpn.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class DualPathBlock(nn.Cell):
    """A block for Dual Path Networks to combine proj, residual and densely network"""

    def __init__(
        self,
        in_channel: int,
        num_1x1_a: int,
        num_3x3_b: int,
        num_1x1_c: int,
        inc: int,
        g: int,
        _type: str = "normal",
        cat_input: bool = True,
    ):
        super().__init__()
        self.num_1x1_c = num_1x1_c

        if _type == "proj":
            key_stride = 1
            self.has_proj = True
        if _type == "down":
            key_stride = 2
            self.has_proj = True
        if _type == "normal":
            key_stride = 1
            self.has_proj = False

        self.cat_input = cat_input

        if self.has_proj:
            self.c1x1_w_bn = nn.BatchNorm2d(in_channel, eps=1e-3, momentum=0.9)
            self.c1x1_w_relu = nn.ReLU()
            self.c1x1_w_r = nn.Conv2d(in_channel, num_1x1_c, kernel_size=1, stride=key_stride,
                                      pad_mode="pad", padding=0)
            self.c1x1_w_d = nn.Conv2d(in_channel, 2 * inc, kernel_size=1, stride=key_stride,
                                      pad_mode="pad", padding=0)

        self.layers = BottleBlock(in_channel, num_1x1_a, num_3x3_b, num_1x1_c, inc, g, key_stride)

    def construct(self, x: Tensor):
        if self.cat_input:
            data_in = ops.concat(x, axis=1)
        else:
            data_in = x

        if self.has_proj:
            data_o = self.c1x1_w_bn(data_in)
            data_o = self.c1x1_w_relu(data_o)
            data_o1 = self.c1x1_w_r(data_o)
            data_o2 = self.c1x1_w_d(data_o)
        else:
            data_o1 = x[0]
            data_o2 = x[1]

        out = self.layers(data_in)
        summ = ops.add(data_o1, out[0])
        dense = ops.concat((data_o2, out[1]), axis=1)
        return (summ, dense)
mindocr.models.backbones.mindcv_models.dpn.dpn107(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 107 layers DPN model. Refer to the base class models.DPN for more details.

Source code in mindocr\models\backbones\mindcv_models\dpn.py
304
305
306
307
308
309
310
311
312
313
314
315
@register_model
def dpn107(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DPN:
    """Get 107 layers DPN model.
     Refer to the base class `models.DPN` for more details."""
    default_cfg = default_cfgs["dpn107"]
    model = DPN(num_init_channel=128, k_r=200, g=50, k_sec=(4, 8, 20, 3), inc_sec=(20, 64, 64, 128),
                num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.dpn.dpn131(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 131 layers DPN model. Refer to the base class models.DPN for more details.

Source code in mindocr\models\backbones\mindcv_models\dpn.py
290
291
292
293
294
295
296
297
298
299
300
301
@register_model
def dpn131(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DPN:
    """Get 131 layers DPN model.
     Refer to the base class `models.DPN` for more details."""
    default_cfg = default_cfgs["dpn131"]
    model = DPN(num_init_channel=128, k_r=160, g=40, k_sec=(4, 8, 28, 3), inc_sec=(16, 32, 32, 128),
                num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.dpn.dpn92(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 92 layers DPN model. Refer to the base class models.DPN for more details.

Source code in mindocr\models\backbones\mindcv_models\dpn.py
262
263
264
265
266
267
268
269
270
271
272
273
@register_model
def dpn92(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DPN:
    """Get 92 layers DPN model.
     Refer to the base class `models.DPN` for more details."""
    default_cfg = default_cfgs["dpn92"]
    model = DPN(num_init_channel=64, k_r=96, g=32, k_sec=(3, 4, 20, 3), inc_sec=(16, 32, 24, 128),
                num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.dpn.dpn98(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 98 layers DPN model. Refer to the base class models.DPN for more details.

Source code in mindocr\models\backbones\mindcv_models\dpn.py
276
277
278
279
280
281
282
283
284
285
286
287
@register_model
def dpn98(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> DPN:
    """Get 98 layers DPN model.
     Refer to the base class `models.DPN` for more details."""
    default_cfg = default_cfgs["dpn98"]
    model = DPN(num_init_channel=96, k_r=160, g=40, k_sec=(3, 6, 20, 3), inc_sec=(16, 32, 32, 128),
                num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.edgenext

MindSpore implementation of edgenext. Refer to EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications.

mindocr.models.backbones.mindcv_models.edgenext.EdgeNeXt

Bases: nn.Cell

EdgeNeXt model class, based on "Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision" <https://arxiv.org/abs/2206.10589>_

PARAMETER DESCRIPTION
in_channels

number of input channels. Default: 3

num_classes

number of classification classes. Default: 1000

DEFAULT: 1000

depths

the depths of each layer. Default: [0, 0, 0, 3]

DEFAULT: [3, 3, 9, 3]

dims

the middle dim of each layer. Default: [24, 48, 88, 168]

DEFAULT: [24, 48, 88, 168]

global_block

number of global block. Default: [0, 0, 0, 3]

DEFAULT: [0, 0, 0, 3]

global_block_type

type of global block. Default: ['None', 'None', 'None', 'SDTA']

DEFAULT: ['None', 'None', 'None', 'SDTA']

drop_path_rate

Stochastic Depth. Default: 0.

DEFAULT: 0.0

layer_scale_init_value

value of layer scale initialization. Default: 1e-6

DEFAULT: 1e-06

head_init_scale

scale of head initialization. Default: 1.

DEFAULT: 1.0

expan_ratio

ratio of expansion. Default: 4

DEFAULT: 4

kernel_sizes

kernel sizes of different stages. Default: [7, 7, 7, 7]

DEFAULT: [7, 7, 7, 7]

heads

number of attention heads. Default: [8, 8, 8, 8]

DEFAULT: [8, 8, 8, 8]

use_pos_embd_xca

use position embedding in xca or not. Default: [False, False, False, False]

DEFAULT: [False, False, False, False]

use_pos_embd_global

use position embedding globally or not. Default: False

DEFAULT: False

d2_scales

scales of splitting channels

DEFAULT: [2, 3, 4, 5]

Source code in mindocr\models\backbones\mindcv_models\edgenext.py
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
class EdgeNeXt(nn.Cell):
    r"""EdgeNeXt model class, based on
    `"Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision" <https://arxiv.org/abs/2206.10589>`_

    Args:
        in_channels: number of input channels. Default: 3
        num_classes: number of classification classes. Default: 1000
        depths: the depths of each layer. Default: [0, 0, 0, 3]
        dims: the middle dim of each layer. Default: [24, 48, 88, 168]
        global_block: number of global block. Default: [0, 0, 0, 3]
        global_block_type: type of global block. Default: ['None', 'None', 'None', 'SDTA']
        drop_path_rate: Stochastic Depth. Default: 0.
        layer_scale_init_value: value of layer scale initialization. Default: 1e-6
        head_init_scale: scale of head initialization. Default: 1.
        expan_ratio: ratio of expansion. Default: 4
        kernel_sizes: kernel sizes of different stages. Default: [7, 7, 7, 7]
        heads: number of attention heads. Default: [8, 8, 8, 8]
        use_pos_embd_xca: use position embedding in xca or not. Default: [False, False, False, False]
        use_pos_embd_global: use position embedding globally or not. Default: False
        d2_scales: scales of splitting channels
    """
    def __init__(self, in_chans=3, num_classes=1000,
                 depths=[3, 3, 9, 3], dims=[24, 48, 88, 168],
                 global_block=[0, 0, 0, 3], global_block_type=["None", "None", "None", "SDTA"],
                 drop_path_rate=0., layer_scale_init_value=1e-6, head_init_scale=1., expan_ratio=4,
                 kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False],
                 use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], **kwargs):
        super().__init__()
        for g in global_block_type:
            assert g in ["None", "SDTA"]
        if use_pos_embd_global:
            self.pos_embd = PositionalEncodingFourier(dim=dims[0])
        else:
            self.pos_embd = None
        self.downsample_layers = nn.CellList()  # stem and 3 intermediate downsampling conv layers
        stem = nn.SequentialCell(
            nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4, has_bias=True),
            LayerNorm((dims[0],), epsilon=1e-6, norm_axis=1),
        )
        self.downsample_layers.append(stem)
        for i in range(3):
            downsample_layer = nn.SequentialCell(
                LayerNorm((dims[i],), epsilon=1e-6, norm_axis=1),
                nn.Conv2d(dims[i], dims[i + 1], kernel_size=2, stride=2, has_bias=True),
            )
            self.downsample_layers.append(downsample_layer)

        self.stages = nn.CellList()  # 4 feature resolution stages, each consisting of multiple residual blocks
        dp_rates = list(np.linspace(0, drop_path_rate, sum(depths)))
        cur = 0
        for i in range(4):
            stage_blocks = []
            for j in range(depths[i]):
                if j > depths[i] - global_block[i] - 1:
                    if global_block_type[i] == "SDTA":
                        stage_blocks.append(SDTAEncoder(dim=dims[i], drop_path=dp_rates[cur + j],
                                                        expan_ratio=expan_ratio, scales=d2_scales[i],
                                                        use_pos_emb=use_pos_embd_xca[i], num_heads=heads[i]))
                    else:
                        raise NotImplementedError
                else:
                    stage_blocks.append(ConvEncoder(dim=dims[i], drop_path=dp_rates[cur + j],
                                                    layer_scale_init_value=layer_scale_init_value,
                                                    expan_ratio=expan_ratio, kernel_size=kernel_sizes[i]))

            self.stages.append(nn.SequentialCell(*stage_blocks))
            cur += depths[i]
        self.norm = nn.LayerNorm((dims[-1],), epsilon=1e-6)  # Final norm layer
        self.head = nn.Dense(dims[-1], num_classes)

        # self.head_dropout = nn.Dropout(kwargs["classifier_dropout"])
        self.head_dropout = nn.Dropout(1.0)
        self.head_init_scale = head_init_scale
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, (nn.Dense, nn.Conv2d)):
                cell.weight.set_data(
                    init.initializer(init.TruncatedNormal(sigma=0.02), cell.weight.shape, cell.weight.dtype)
                )
                if isinstance(cell, nn.Dense) and cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, (nn.LayerNorm)):
                cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))
        self.head.weight.set_data(self.head.weight * self.head_init_scale)
        self.head.bias.set_data(self.head.bias * self.head_init_scale)

    def forward_features(self, x):
        x = self.downsample_layers[0](x)
        x = self.stages[0](x)
        if self.pos_embd is not None:
            B, C, H, W = x.shape
            x = x + self.pos_embd(B, H, W)
        for i in range(1, 4):
            x = self.downsample_layers[i](x)
            x = self.stages[i](x)
        return self.norm(x.mean([-2, -1]))  # Global average pooling, (N, C, H, W) -> (N, C)

    def construct(self, x):
        x = self.forward_features(x)
        x = self.head(self.head_dropout(x))
        return x
mindocr.models.backbones.mindcv_models.edgenext.LayerNorm

Bases: nn.LayerNorm

LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W).

Source code in mindocr\models\backbones\mindcv_models\edgenext.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class LayerNorm(nn.LayerNorm):
    r"""LayerNorm for channels_first tensors with 2d spatial dimensions (ie N, C, H, W)."""

    def __init__(
        self,
        normalized_shape: Tuple[int],
        epsilon: float,
        norm_axis: int = -1,
    ) -> None:
        super().__init__(normalized_shape=normalized_shape, epsilon=epsilon)
        assert norm_axis in (-1, 1), "ConvNextLayerNorm's norm_axis must be 1 or -1."
        self.norm_axis = norm_axis

    def construct(self, input_x: Tensor) -> Tensor:
        if self.norm_axis == -1:
            y, _, _ = self.layer_norm(input_x, self.gamma, self.beta)
        else:
            input_x = ops.transpose(input_x, (0, 2, 3, 1))
            y, _, _ = self.layer_norm(input_x, self.gamma, self.beta)
            y = ops.transpose(y, (0, 3, 1, 2))
        return y
mindocr.models.backbones.mindcv_models.edgenext.edgenext_base(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get edgenext_base model. Refer to the base class models.EdgeNeXt for more details.

Source code in mindocr\models\backbones\mindcv_models\edgenext.py
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
@register_model
def edgenext_base(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> EdgeNeXt:
    """Get edgenext_base model.
    Refer to the base class `models.EdgeNeXt` for more details."""
    default_cfg = default_cfgs["edgenext_base"]
    model = EdgeNeXt(
        depths=[3, 3, 9, 3],
        dims=[80, 160, 288, 584],
        expan_ratio=4,
        num_classes=num_classes,
        global_block=[0, 1, 1, 1],
        global_block_type=["None", "SDTA", "SDTA", "SDTA"],
        use_pos_embd_xca=[False, True, False, False],
        kernel_sizes=[3, 5, 7, 9],
        d2_scales=[2, 2, 3, 4],
        **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.edgenext.edgenext_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get edgenext_small model. Refer to the base class models.EdgeNeXt for more details.

Source code in mindocr\models\backbones\mindcv_models\edgenext.py
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
@register_model
def edgenext_small(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> EdgeNeXt:
    """Get edgenext_small model.
    Refer to the base class `models.EdgeNeXt` for more details."""
    default_cfg = default_cfgs["edgenext_small"]
    model = EdgeNeXt(
        depths=[3, 3, 9, 3],
        dims=[48, 96, 160, 304],
        expan_ratio=4,
        num_classes=num_classes,
        global_block=[0, 1, 1, 1],
        global_block_type=["None", "SDTA", "SDTA", "SDTA"],
        use_pos_embd_xca=[False, True, False, False],
        kernel_sizes=[3, 5, 7, 9],
        d2_scales=[2, 2, 3, 4],
        **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.edgenext.edgenext_x_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get edgenext_x_small model. Refer to the base class models.EdgeNeXt for more details.

Source code in mindocr\models\backbones\mindcv_models\edgenext.py
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
@register_model
def edgenext_x_small(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> EdgeNeXt:
    """Get edgenext_x_small model.
    Refer to the base class `models.EdgeNeXt` for more details."""
    default_cfg = default_cfgs["edgenext_x_small"]
    model = EdgeNeXt(
        depths=[3, 3, 9, 3],
        dims=[32, 64, 100, 192],
        expan_ratio=4,
        num_classes=num_classes,
        global_block=[0, 1, 1, 1],
        global_block_type=["None", "SDTA", "SDTA", "SDTA"],
        use_pos_embd_xca=[False, True, False, False],
        kernel_sizes=[3, 5, 7, 9],
        heads=[4, 4, 4, 4],
        d2_scales=[2, 2, 3, 4],
        **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.edgenext.edgenext_xx_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get edgenext_xx_small model. Refer to the base class models.EdgeNeXt for more details.

Source code in mindocr\models\backbones\mindcv_models\edgenext.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
@register_model
def edgenext_xx_small(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> EdgeNeXt:
    """Get edgenext_xx_small model.
        Refer to the base class `models.EdgeNeXt` for more details."""
    default_cfg = default_cfgs["edgenext_xx_small"]
    model = EdgeNeXt(
        depths=[2, 2, 6, 2],
        dims=[24, 48, 88, 168],
        expan_ratio=4,
        global_block=[0, 1, 1, 1],
        global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
        use_pos_embd_xca=[False, True, False, False],
        kernel_sizes=[3, 5, 7, 9],
        heads=[4, 4, 4, 4],
        d2_scales=[2, 2, 3, 4],
        **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.efficientnet

EfficientNet Architecture.

mindocr.models.backbones.mindcv_models.efficientnet.EfficientNet

Bases: nn.Cell

EfficientNet architecture. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
arch

The name of the model.

TYPE: str

dropout_rate

The dropout rate of efficientnet.

TYPE: float

width_mult

The ratio of the channel. Default: 1.0.

TYPE: float DEFAULT: 1.0

depth_mult

The ratio of num_layers. Default: 1.0.

TYPE: float DEFAULT: 1.0

in_channels

The input channels. Default: 3.

TYPE: int DEFAULT: 3

num_classes

The number of class. Default: 1000.

TYPE: int DEFAULT: 1000

inverted_residual_setting

The settings of block. Default: None.

TYPE: Sequence[Union[MBConvConfig, FusedMBConvConfig]] DEFAULT: None

keep_prob

The dropout rate of MBConv. Default: 0.2.

TYPE: float DEFAULT: 0.2

norm_layer

The normalization layer. Default: None.

TYPE: nn.Cell DEFAULT: None

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, 1000).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
class EfficientNet(nn.Cell):
    """
    EfficientNet architecture.
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        arch (str): The name of the model.
        dropout_rate (float): The dropout rate of efficientnet.
        width_mult (float): The ratio of the channel. Default: 1.0.
        depth_mult (float): The ratio of num_layers. Default: 1.0.
        in_channels (int): The input channels. Default: 3.
        num_classes (int): The number of class. Default: 1000.
        inverted_residual_setting (Sequence[Union[MBConvConfig, FusedMBConvConfig]], optional): The settings of block.
            Default: None.
        keep_prob (float): The dropout rate of MBConv. Default: 0.2.
        norm_layer (nn.Cell, optional): The normalization layer. Default: None.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, 1000)`.
    """

    def __init__(
        self,
        arch: str,
        dropout_rate: float,
        width_mult: float = 1.0,
        depth_mult: float = 1.0,
        in_channels: int = 3,
        num_classes: int = 1000,
        inverted_residual_setting: Optional[Sequence[Union[MBConvConfig, FusedMBConvConfig]]] = None,
        keep_prob: float = 0.2,
        norm_layer: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        self.last_channel = None

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
            if width_mult >= 1.6:
                norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.99)

        layers: List[nn.Cell] = []

        if not inverted_residual_setting:
            if arch.startswith("efficientnet_b"):
                bneck_conf = partial(MBConvConfig, width_cnf=width_mult, depth_cnf=depth_mult)
                inverted_residual_setting = [
                    bneck_conf(1, 3, 1, 32, 16, 1),
                    bneck_conf(6, 3, 2, 16, 24, 2),
                    bneck_conf(6, 5, 2, 24, 40, 2),
                    bneck_conf(6, 3, 2, 40, 80, 3),
                    bneck_conf(6, 5, 1, 80, 112, 3),
                    bneck_conf(6, 5, 2, 112, 192, 4),
                    bneck_conf(6, 3, 1, 192, 320, 1),
                ]
            elif arch.startswith("efficientnet_v2_s"):
                inverted_residual_setting = [
                    FusedMBConvConfig(1, 3, 1, 24, 24, 2),
                    FusedMBConvConfig(4, 3, 2, 24, 48, 4),
                    FusedMBConvConfig(4, 3, 2, 48, 64, 4),
                    MBConvConfig(4, 3, 2, 64, 128, 6),
                    MBConvConfig(6, 3, 1, 128, 160, 9),
                    MBConvConfig(6, 3, 2, 160, 256, 15),
                ]
                self.last_channel = 1280
            elif arch.startswith("efficientnet_v2_m"):
                inverted_residual_setting = [
                    FusedMBConvConfig(1, 3, 1, 24, 24, 3),
                    FusedMBConvConfig(4, 3, 2, 24, 48, 5),
                    FusedMBConvConfig(4, 3, 2, 48, 80, 5),
                    MBConvConfig(4, 3, 2, 80, 160, 7),
                    MBConvConfig(6, 3, 1, 160, 176, 14),
                    MBConvConfig(6, 3, 2, 176, 304, 18),
                    MBConvConfig(6, 3, 1, 304, 512, 5),
                ]
                self.last_channel = 1280
            elif arch.startswith("efficientnet_v2_l"):
                inverted_residual_setting = [
                    FusedMBConvConfig(1, 3, 1, 32, 32, 4),
                    FusedMBConvConfig(4, 3, 2, 32, 64, 7),
                    FusedMBConvConfig(4, 3, 2, 64, 96, 7),
                    MBConvConfig(4, 3, 2, 96, 192, 10),
                    MBConvConfig(6, 3, 1, 192, 224, 19),
                    MBConvConfig(6, 3, 2, 224, 384, 25),
                    MBConvConfig(6, 3, 1, 384, 640, 7),
                ]
                self.last_channel = 1280
            elif arch.startswith("efficientnet_v2_xl"):
                inverted_residual_setting = [
                    FusedMBConvConfig(1, 3, 1, 32, 32, 4),
                    FusedMBConvConfig(4, 3, 2, 32, 64, 8),
                    FusedMBConvConfig(4, 3, 2, 64, 96, 8),
                    MBConvConfig(4, 3, 2, 96, 192, 16),
                    MBConvConfig(6, 3, 1, 192, 256, 24),
                    MBConvConfig(6, 3, 2, 256, 512, 32),
                    MBConvConfig(6, 3, 1, 512, 640, 8),
                ]
                self.last_channel = 1280

        # building first layer
        firstconv_output_channels = inverted_residual_setting[0].input_channels
        layers.extend([
            nn.Conv2d(in_channels, firstconv_output_channels, kernel_size=3, stride=2),
            norm_layer(firstconv_output_channels),
            Swish(),
        ])

        # building MBConv blocks
        total_stage_blocks = sum(cnf.num_layers for cnf in inverted_residual_setting)
        stage_block_id = 0

        # cnf is the settings of block
        for cnf in inverted_residual_setting:
            stage: List[nn.Cell] = []

            # cnf.num_layers is the num of the same block
            for _ in range(cnf.num_layers):
                # copy to avoid modifications. shallow copy is enough
                block_cnf = copy.copy(cnf)

                block = MBConv

                if "FusedMBConvConfig" in str(type(block_cnf)):
                    block = FusedMBConv

                # overwrite info if not the first conv in the stage
                if stage:
                    block_cnf.input_channels = block_cnf.out_channels
                    block_cnf.stride = 1

                # adjust dropout rate of blocks based on the depth of the stage block
                sd_prob = keep_prob * float(stage_block_id + 0.00001) / total_stage_blocks

                stage.append(block(block_cnf, sd_prob, norm_layer))
                stage_block_id += 1

            layers.append(nn.SequentialCell(stage))

        # building last several layers
        lastconv_input_channels = inverted_residual_setting[-1].out_channels
        lastconv_output_channels = self.last_channel if self.last_channel is not None else 4 * lastconv_input_channels
        layers.extend([
            nn.Conv2d(lastconv_input_channels, lastconv_output_channels, kernel_size=1),
            norm_layer(lastconv_output_channels),
            Swish(),
        ])

        self.features = nn.SequentialCell(layers)
        self.avgpool = GlobalAvgPooling()
        self.dropout = nn.Dropout(1 - dropout_rate)
        self.mlp_head = nn.Dense(lastconv_output_channels, num_classes)
        self._initialize_weights()

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)

        x = self.avgpool(x)

        if self.training:
            x = self.dropout(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        return self.mlp_head(x)

    def construct(self, x: Tensor) -> Tensor:
        """construct"""
        x = self.forward_features(x)
        return self.forward_head(x)

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                init_range = 1.0 / np.sqrt(cell.weight.shape[0])
                cell.weight.set_data(weight_init.initializer(Uniform(init_range), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(weight_init.initializer(weight_init.Zero(), cell.bias.shape, cell.bias.dtype))
            if isinstance(cell, nn.Conv2d):
                out_channel, _, kernel_size_h, kernel_size_w = cell.weight.shape
                stddev = np.sqrt(2 / int(out_channel * kernel_size_h * kernel_size_w))
                cell.weight.set_data(
                    weight_init.initializer(Normal(sigma=stddev), cell.weight.shape, cell.weight.dtype)
                )
                if cell.bias is not None:
                    cell.bias.set_data(weight_init.initializer(weight_init.Zero(), cell.bias.shape, cell.bias.dtype))
mindocr.models.backbones.mindcv_models.efficientnet.EfficientNet.construct(x)

construct

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
440
441
442
443
def construct(self, x: Tensor) -> Tensor:
    """construct"""
    x = self.forward_features(x)
    return self.forward_head(x)
mindocr.models.backbones.mindcv_models.efficientnet.FusedMBConv

Bases: nn.Cell

FusedMBConv

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
class FusedMBConv(nn.Cell):
    """FusedMBConv"""

    def __init__(
        self,
        cnf: FusedMBConvConfig,
        keep_prob: float,
        norm: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()

        if not 1 <= cnf.stride <= 2:
            raise ValueError("illegal stride value")

        self.shortcut = cnf.stride == 1 and cnf.input_channels == cnf.out_channels

        layers: List[nn.Cell] = []

        expanded_channels = cnf.adjust_channels(cnf.input_channels, cnf.expand_ratio)
        if expanded_channels != cnf.input_channels:
            # fused expand
            layers.extend([
                nn.Conv2d(cnf.input_channels, expanded_channels, kernel_size=cnf.kernel_size,
                          stride=cnf.stride),
                norm(expanded_channels),
                Swish(),
            ])

            # project
            layers.extend([
                nn.Conv2d(expanded_channels, cnf.out_channels, kernel_size=1),
                norm(cnf.out_channels),
            ])
        else:
            layers.extend([
                nn.Conv2d(cnf.input_channels, cnf.out_channels, kernel_size=cnf.kernel_size,
                          stride=cnf.stride),
                norm(cnf.out_channels),
                Swish(),
            ])

        self.block = nn.SequentialCell(layers)
        self.dropout = DropPath(keep_prob)
        self.out_channels = cnf.out_channels

    def construct(self, x) -> Tensor:
        result = self.block(x)
        if self.shortcut:
            result = self.dropout(result)
            result += x
        return result
mindocr.models.backbones.mindcv_models.efficientnet.FusedMBConvConfig

Bases: MBConvConfig

FusedMBConvConfig

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
class FusedMBConvConfig(MBConvConfig):
    """FusedMBConvConfig"""

    # Stores information listed at Table 4 of the EfficientNetV2 paper
    def __init__(
        self,
        expand_ratio: float,
        kernel_size: int,
        stride: int,
        in_chs: int,
        out_chs: int,
        num_layers: int,
    ) -> None:
        super().__init__(expand_ratio, kernel_size, stride, in_chs, out_chs, num_layers)
mindocr.models.backbones.mindcv_models.efficientnet.MBConv

Bases: nn.Cell

MBConv Module.

PARAMETER DESCRIPTION
cnf

The class which contains the parameters(in_channels, out_channels, nums_layers) and the functions which help calculate the parameters after multipling the expand_ratio.

TYPE: MBConvConfig

keep_prob

The dropout rate in MBConv. Default: 0.8.

TYPE: float DEFAULT: 0.8

norm

The BatchNorm Method. Default: None.

TYPE: nn.Cell DEFAULT: None

se_layer

The squeeze-excite Module. Default: SqueezeExcite.

TYPE: nn.Cell DEFAULT: SqueezeExcite

RETURNS DESCRIPTION

Tensor

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
class MBConv(nn.Cell):
    """
    MBConv Module.

    Args:
        cnf (MBConvConfig): The class which contains the parameters(in_channels, out_channels, nums_layers) and
            the functions which help calculate the parameters after multipling the expand_ratio.
        keep_prob: The dropout rate in MBConv. Default: 0.8.
        norm (nn.Cell): The BatchNorm Method. Default: None.
        se_layer (nn.Cell): The squeeze-excite Module. Default: SqueezeExcite.

    Returns:
        Tensor
    """

    def __init__(
        self,
        cnf: MBConvConfig,
        keep_prob: float = 0.8,
        norm: Optional[nn.Cell] = None,
        se_layer: Callable[..., nn.Cell] = SqueezeExcite,
    ) -> None:
        super().__init__()

        self.shortcut = cnf.stride == 1 and cnf.input_channels == cnf.out_channels

        layers: List[nn.Cell] = []

        # expand conv: the out_channels is cnf.expand_ratio times of the in_channels.
        expanded_channels = cnf.adjust_channels(cnf.input_channels, cnf.expand_ratio)
        if expanded_channels != cnf.input_channels:
            layers.extend([
                nn.Conv2d(cnf.input_channels, expanded_channels, kernel_size=1),
                norm(expanded_channels),
                Swish(),
            ])

        # depthwise conv: splits the filter into groups.
        layers.extend([
            nn.Conv2d(expanded_channels, expanded_channels, kernel_size=cnf.kernel_size,
                      stride=cnf.stride, group=expanded_channels),
            norm(expanded_channels),
            Swish(),
        ])

        # squeeze and excitation
        squeeze_channels = max(1, cnf.input_channels // 4)
        layers.append(se_layer(in_channels=expanded_channels, rd_channels=squeeze_channels, act_layer=Swish))

        # project
        layers.extend([
            nn.Conv2d(expanded_channels, cnf.out_channels, kernel_size=1),
            norm(cnf.out_channels),
        ])

        self.block = nn.SequentialCell(layers)
        self.dropout = DropPath(keep_prob)
        self.out_channels = cnf.out_channels

    def construct(self, x) -> Tensor:
        result = self.block(x)
        if self.shortcut:
            result = self.dropout(result)
            result += x
        return result
mindocr.models.backbones.mindcv_models.efficientnet.MBConvConfig

The Parameters of MBConv which need to multiply the expand_ration.

PARAMETER DESCRIPTION
expand_ratio

The Times of the num of out_channels with respect to in_channels.

TYPE: float

kernel_size

The kernel size of the depthwise conv.

TYPE: int

stride

The stride of the depthwise conv.

TYPE: int

in_chs

The input_channels of the MBConv Module.

TYPE: int

out_chs

The output_channels of the MBConv Module.

TYPE: int

num_layers

The num of MBConv Module.

TYPE: int

width_cnf

The ratio of the channel. Default: 1.0.

TYPE: float DEFAULT: 1.0

depth_cnf

The ratio of num_layers. Default: 1.0.

TYPE: float DEFAULT: 1.0

RETURNS DESCRIPTION

None

Examples:

>>> cnf = MBConvConfig(1, 3, 1, 32, 16, 1)
>>> print(cnf.input_channels)
Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
class MBConvConfig:
    """
    The Parameters of MBConv which need to multiply the expand_ration.

    Args:
        expand_ratio (float): The Times of the num of out_channels with respect to in_channels.
        kernel_size (int): The kernel size of the depthwise conv.
        stride (int): The stride of the depthwise conv.
        in_chs (int): The input_channels of the MBConv Module.
        out_chs (int): The output_channels of the MBConv Module.
        num_layers (int): The num of MBConv Module.
        width_cnf: The ratio of the channel. Default: 1.0.
        depth_cnf: The ratio of num_layers. Default: 1.0.

    Returns:
        None

    Examples:
        >>> cnf = MBConvConfig(1, 3, 1, 32, 16, 1)
        >>> print(cnf.input_channels)
    """

    def __init__(
        self,
        expand_ratio: float,
        kernel_size: int,
        stride: int,
        in_chs: int,
        out_chs: int,
        num_layers: int,
        width_cnf: float = 1.0,
        depth_cnf: float = 1.0,
    ) -> None:
        self.expand_ratio = expand_ratio
        self.kernel_size = kernel_size
        self.stride = stride
        self.input_channels = self.adjust_channels(in_chs, width_cnf)
        self.out_channels = self.adjust_channels(out_chs, width_cnf)
        self.num_layers = self.adjust_depth(num_layers, depth_cnf)

    @staticmethod
    def adjust_channels(channels: int, width_cnf: float, min_value: Optional[int] = None) -> int:
        """
        Calculate the width of MBConv.

        Args:
            channels (int): The number of channel.
            width_cnf (float): The ratio of channel.
            min_value (int, optional): The minimum number of channel. Default: None.

        Returns:
            int, the width of MBConv.
        """

        return make_divisible(channels * width_cnf, 8, min_value)

    @staticmethod
    def adjust_depth(num_layers: int, depth_cnf: float) -> int:
        """
        Calculate the depth of MBConv.

        Args:
            num_layers (int): The number of MBConv Module.
            depth_cnf (float): The ratio of num_layers.

        Returns:
            int, the depth of MBConv.
        """

        return int(math.ceil(num_layers * depth_cnf))
mindocr.models.backbones.mindcv_models.efficientnet.MBConvConfig.adjust_channels(channels, width_cnf, min_value=None) staticmethod

Calculate the width of MBConv.

PARAMETER DESCRIPTION
channels

The number of channel.

TYPE: int

width_cnf

The ratio of channel.

TYPE: float

min_value

The minimum number of channel. Default: None.

TYPE: int DEFAULT: None

RETURNS DESCRIPTION
int

int, the width of MBConv.

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
@staticmethod
def adjust_channels(channels: int, width_cnf: float, min_value: Optional[int] = None) -> int:
    """
    Calculate the width of MBConv.

    Args:
        channels (int): The number of channel.
        width_cnf (float): The ratio of channel.
        min_value (int, optional): The minimum number of channel. Default: None.

    Returns:
        int, the width of MBConv.
    """

    return make_divisible(channels * width_cnf, 8, min_value)
mindocr.models.backbones.mindcv_models.efficientnet.MBConvConfig.adjust_depth(num_layers, depth_cnf) staticmethod

Calculate the depth of MBConv.

PARAMETER DESCRIPTION
num_layers

The number of MBConv Module.

TYPE: int

depth_cnf

The ratio of num_layers.

TYPE: float

RETURNS DESCRIPTION
int

int, the depth of MBConv.

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
@staticmethod
def adjust_depth(num_layers: int, depth_cnf: float) -> int:
    """
    Calculate the depth of MBConv.

    Args:
        num_layers (int): The number of MBConv Module.
        depth_cnf (float): The ratio of num_layers.

    Returns:
        int, the depth of MBConv.
    """

    return int(math.ceil(num_layers * depth_cnf))
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B0 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
@register_model
def efficientnet_b0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B0 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b0", 1.0, 1.0, 0.2, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B1 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
@register_model
def efficientnet_b1(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B1 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b1", 1.0, 1.1, 0.2, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B2 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
@register_model
def efficientnet_b2(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B2 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b2", 1.1, 1.2, 0.3, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B3 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
@register_model
def efficientnet_b3(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B3 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b3", 1.2, 1.4, 0.3, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B4 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
@register_model
def efficientnet_b4(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B4 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b4", 1.4, 1.8, 0.4, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B5 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
@register_model
def efficientnet_b5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B5 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b5", 1.6, 2.2, 0.4, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b6(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B6 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
@register_model
def efficientnet_b6(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B6 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b6", 1.8, 2.6, 0.5, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_b7(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B7 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
@register_model
def efficientnet_b7(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B7 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_b7", 2.0, 3.1, 0.5, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_l(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B4 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
@register_model
def efficientnet_v2_l(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B4 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_v2_l", 1.0, 1.0, 0.2, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_m(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B4 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
@register_model
def efficientnet_v2_m(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B4 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_v2_m", 1.0, 1.0, 0.2, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_s(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B4 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
@register_model
def efficientnet_v2_s(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B4 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_v2_s", 1.0, 1.0, 0.2, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.efficientnet.efficientnet_v2_xl(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Constructs a EfficientNet B4 architecture from EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>_.

PARAMETER DESCRIPTION
pretrained

If True, returns a model pretrained on IMAGENET. Default: False.

TYPE: bool DEFAULT: False

num_classes

The numbers of classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The input channels. Default: 1000.

TYPE: int DEFAULT: 3

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, CLASSES_{out}).

Source code in mindocr\models\backbones\mindcv_models\efficientnet.py
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
@register_model
def efficientnet_v2_xl(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> EfficientNet:
    """
    Constructs a EfficientNet B4 architecture from
    `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pretrained on IMAGENET. Default: False.
        num_classes (int): The numbers of classes. Default: 1000.
        in_channels (int): The input channels. Default: 1000.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`.
    """
    return _efficientnet("efficientnet_v2_xl", 1.0, 1.0, 0.2, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.ghostnet

MindSpore implementation of GhostNet.

mindocr.models.backbones.mindcv_models.ghostnet.ConvBnAct

Bases: nn.Cell

A block for conv bn and relu

Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class ConvBnAct(nn.Cell):
    """A block for conv bn and relu"""

    def __init__(self, in_chs, out_chs, kernel_size,
                 stride=1, act_layer=nn.ReLU):
        super().__init__()
        self.conv = nn.Conv2d(in_chs, out_chs, kernel_size=kernel_size, stride=stride,
                              padding=kernel_size // 2, pad_mode="pad", has_bias=False)
        self.bn1 = nn.BatchNorm2d(out_chs)
        self.act1 = act_layer()

    def construct(self, x):
        x = self.conv(x)
        x = self.bn1(x)
        x = self.act1(x)
        return x
mindocr.models.backbones.mindcv_models.ghostnet.GhostGate

Bases: nn.Cell

Implementation for (relu6 + 3) / 6

Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
37
38
39
40
41
42
43
44
45
class GhostGate(nn.Cell):
    """Implementation for (relu6 + 3) / 6"""

    def __init__(self):
        super().__init__()
        self.relu6 = nn.ReLU6()

    def construct(self, x):
        return self.relu6(x + 3.0) * 0.16666667
mindocr.models.backbones.mindcv_models.ghostnet.GhostNet

Bases: nn.Cell

GhostNet model class, based on "GhostNet: More Features from Cheap Operations " <https://arxiv.org/abs/1911.11907>_

PARAMETER DESCRIPTION
cfgs

the config of the GhostNet.

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number of input channels. Default: 3.

TYPE: int DEFAULT: 3

width

base width of hidden channel in blocks. Default: 1.0

TYPE: float DEFAULT: 1.0

droupout

the probability of the features before classification. Default: 0.2

Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
class GhostNet(nn.Cell):
    r"""GhostNet model class, based on
    `"GhostNet: More Features from Cheap Operations " <https://arxiv.org/abs/1911.11907>`_

    Args:
        cfgs: the config of the GhostNet.
        num_classes: number of classification classes. Default: 1000.
        in_channels: number of input channels. Default: 3.
        width: base width of hidden channel in blocks. Default: 1.0
        droupout: the probability of the features before classification. Default: 0.2
    """

    def __init__(
        self,
        cfgs,
        num_classes: int = 1000,
        in_channels: int = 3,
        width: float = 1.0,
        dropout: float = 0.2,
    ) -> None:
        super().__init__()
        # setting of inverted residual blocks
        self.cfgs = cfgs
        self.dropout_rate = dropout

        # building first layer
        output_channel = make_divisible(16 * width, 4)
        self.conv_stem = nn.Conv2d(in_channels, output_channel, kernel_size=3,
                                   padding=1, stride=2, has_bias=False, pad_mode="pad")
        self.bn1 = nn.BatchNorm2d(output_channel)
        self.act1 = nn.ReLU()
        input_channel = output_channel

        # building inverted residual blocks
        stages = []
        block = GhostBottleneck
        exp_size = 128
        for cfg in self.cfgs:
            layers = []
            for k, exp_size, c, se_ratio, s in cfg:
                output_channel = make_divisible(c * width, 4)
                hidden_channel = make_divisible(exp_size * width, 4)
                layers.append(block(input_channel, hidden_channel, output_channel, k, s, se_ratio=se_ratio))
                input_channel = output_channel
            stages.append(nn.SequentialCell([*layers]))

        output_channel = make_divisible(exp_size * width, 4)
        stages.append(nn.SequentialCell([ConvBnAct(input_channel, output_channel, 1)]))
        input_channel = output_channel

        self.blocks = nn.SequentialCell([*stages])

        # building last several layers
        output_channel = 1280
        self.global_pool = GlobalAvgPooling(keep_dims=True)
        self.conv_head = nn.Conv2d(input_channel, output_channel, kernel_size=1,
                                   padding=0, stride=1, has_bias=True, pad_mode="pad")
        self.act2 = nn.ReLU()
        if self.dropout_rate > 0:
            self.dropout = nn.Dropout(self.dropout_rate)
        self.classifier = nn.Dense(output_channel, num_classes)
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        self.init_parameters_data()
        for _, m in self.cells_and_names():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.set_data(Tensor(np.random.normal(0, np.sqrt(2.0 / n), m.weight.data.shape).astype("float32")))
                if m.bias is not None:
                    m.bias.set_data(Tensor(np.zeros(m.bias.data.shape, dtype="float32")))
            elif isinstance(m, nn.BatchNorm2d):
                m.gamma.set_data(Tensor(np.ones(m.gamma.data.shape, dtype="float32")))
                m.beta.set_data(Tensor(np.zeros(m.beta.data.shape, dtype="float32")))
            elif isinstance(m, nn.Dense):
                m.weight.set_data(Tensor(np.random.normal(0, 0.01, m.weight.data.shape).astype("float32")))
                if m.bias is not None:
                    m.bias.set_data(Tensor(np.zeros(m.bias.data.shape, dtype="float32")))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.conv_stem(x)
        x = self.bn1(x)
        x = self.act1(x)
        x = self.blocks(x)
        x = self.global_pool(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.conv_head(x)
        x = self.act2(x)
        x = ops.flatten(x)
        if self.dropout_rate > 0.0:
            x = self.dropout(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.ghostnet.ghostnet_1x(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get GhostNet model. Refer to the base class 'models.GhostNet' for more details.

Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
334
335
336
337
338
339
340
341
342
343
344
345
@register_model
def ghostnet_1x(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> GhostNet:
    """Get GhostNet model.
    Refer to the base class 'models.GhostNet' for more details.
    """
    model_args = model_cfgs["1x"]["cfg"]
    model = GhostNet(cfgs=model_args, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, model_args, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.ghostnet.ghostnet_nose_1x(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get GhostNet model without SEModule. Refer to the base class 'models.GhostNet' for more details.

Source code in mindocr\models\backbones\mindcv_models\ghostnet.py
348
349
350
351
352
353
354
355
356
357
358
359
@register_model
def ghostnet_nose_1x(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> GhostNet:
    """Get GhostNet model without SEModule.
    Refer to the base class 'models.GhostNet' for more details.
    """
    model_args = model_cfgs["nose_1x"]["cfg"]
    model = GhostNet(cfgs=model_args, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, model_args, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.hrnet

MindSpore implementation of HRNet. Refer to Deep High-Resolution Representation Learning for Visual Recognition

mindocr.models.backbones.mindcv_models.hrnet.BasicBlock

Bases: nn.Cell

Basic block of HRNet

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
class BasicBlock(nn.Cell):
    """Basic block of HRNet"""

    expansion: int = 1

    def __init__(
        self,
        in_channels: int,
        channels: int,
        stride: int = 1,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        down_sample: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d
        assert groups == 1, "BasicBlock only supports groups=1"
        assert base_width == 64, "BasicBlock only supports base_width=64"

        self.conv1 = nn.Conv2d(
            in_channels,
            channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            pad_mode="pad",
        )
        self.bn1 = norm(channels)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(
            channels, channels, kernel_size=3, stride=1, padding=1, pad_mode="pad"
        )
        self.bn2 = norm(channels)
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)

        out += identity
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.hrnet.Bottleneck

Bases: nn.Cell

Bottleneck block of HRNet

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
class Bottleneck(nn.Cell):
    """Bottleneck block of HRNet"""

    expansion: int = 4

    def __init__(
        self,
        in_channels: int,
        channels: int,
        stride: int = 1,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        down_sample: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d

        width = int(channels * (base_width / 64.0)) * groups

        self.conv1 = nn.Conv2d(in_channels, width, kernel_size=1, stride=1)
        self.bn1 = norm(width)
        self.conv2 = nn.Conv2d(
            width,
            width,
            kernel_size=3,
            stride=stride,
            padding=1,
            pad_mode="pad",
            group=groups,
        )
        self.bn2 = norm(width)
        self.conv3 = nn.Conv2d(
            width, channels * self.expansion, kernel_size=1, stride=1
        )
        self.bn3 = norm(channels * self.expansion)
        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)

        out += identity
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.hrnet.HRModule

Bases: nn.Cell

High-Resolution Module for HRNet. In this module, every branch has 4 BasicBlocks/Bottlenecks. Fusion/Exchange is in this module.

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
class HRModule(nn.Cell):
    """High-Resolution Module for HRNet.
    In this module, every branch has 4 BasicBlocks/Bottlenecks. Fusion/Exchange
    is in this module.
    """

    def __init__(
        self,
        num_branches: int,
        block: Type[Union[BasicBlock, Bottleneck]],
        num_blocks: List[int],
        num_inchannels: List[int],
        num_channels: List[int],
        multi_scale_output: bool = True,
    ) -> None:
        super().__init__()
        self._check_branches(num_branches, num_blocks, num_inchannels, num_channels)

        self.num_inchannels = num_inchannels
        self.num_branches = num_branches

        self.multi_scale_output = multi_scale_output

        self.branches = self._make_branches(
            num_branches, block, num_blocks, num_channels
        )
        self.fuse_layers = self._make_fuse_layers()
        self.relu = nn.ReLU()

    @staticmethod
    def _check_branches(
        num_branches: int,
        num_blocks: List[int],
        num_inchannels: List[int],
        num_channels: List[int],
    ) -> None:
        """Check input to avoid ValueError."""
        if num_branches != len(num_blocks):
            error_msg = f"NUM_BRANCHES({num_branches})!= NUM_BLOCKS({len(num_blocks)})"
            raise ValueError(error_msg)

        if num_branches != len(num_channels):
            error_msg = (
                f"NUM_BRANCHES({num_branches})!= NUM_CHANNELS({len(num_channels)})"
            )
            raise ValueError(error_msg)

        if num_branches != len(num_inchannels):
            error_msg = (
                f"NUM_BRANCHES({num_branches}) != NUM_INCHANNELS({len(num_inchannels)})"
            )
            raise ValueError(error_msg)

    def _make_one_branch(
        self,
        branch_index: int,
        block: Type[Union[BasicBlock, Bottleneck]],
        num_blocks: List[int],
        num_channels: List[int],
        stride: int = 1,
    ) -> nn.SequentialCell:
        downsample = None
        if stride != 1 or self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
            downsample = nn.SequentialCell(
                nn.Conv2d(
                    self.num_inchannels[branch_index],
                    num_channels[branch_index] * block.expansion,
                    kernel_size=1,
                    stride=stride,
                ),
                nn.BatchNorm2d(num_channels[branch_index] * block.expansion),
            )

        layers = []
        layers.append(
            block(
                self.num_inchannels[branch_index],
                num_channels[branch_index],
                stride,
                down_sample=downsample,
            )
        )
        self.num_inchannels[branch_index] = num_channels[branch_index] * block.expansion
        for _ in range(1, num_blocks[branch_index]):
            layers.append(
                block(self.num_inchannels[branch_index], num_channels[branch_index])
            )

        return nn.SequentialCell(layers)

    def _make_branches(
        self,
        num_branches: int,
        block: Type[Union[BasicBlock, Bottleneck]],
        num_blocks: List[int],
        num_channels: List[int],
    ) -> nn.CellList:
        """Make branches."""
        branches = []

        for i in range(num_branches):
            branches.append(self._make_one_branch(i, block, num_blocks, num_channels))

        return nn.CellList(branches)

    def _make_fuse_layers(self) -> nn.CellList:
        if self.num_branches == 1:
            return None

        num_branches = self.num_branches
        num_inchannels = self.num_inchannels
        fuse_layers = []
        for i in range(num_branches if self.multi_scale_output else 1):
            fuse_layer = []
            for j in range(num_branches):
                if j > i:
                    fuse_layer.append(
                        nn.SequentialCell(
                            nn.Conv2d(
                                num_inchannels[j], num_inchannels[i], kernel_size=1
                            ),
                            nn.BatchNorm2d(num_inchannels[i]),
                        )
                    )
                elif j == i:
                    fuse_layer.append(IdentityCell())
                else:
                    conv3x3s = []
                    for k in range(i - j):
                        if k == i - j - 1:
                            num_outchannels_conv3x3 = num_inchannels[i]
                            conv3x3s.append(
                                nn.SequentialCell(
                                    nn.Conv2d(
                                        num_inchannels[j],
                                        num_outchannels_conv3x3,
                                        kernel_size=3,
                                        stride=2,
                                        padding=1,
                                        pad_mode="pad",
                                    ),
                                    nn.BatchNorm2d(num_outchannels_conv3x3),
                                )
                            )
                        else:
                            num_outchannels_conv3x3 = num_inchannels[j]
                            conv3x3s.append(
                                nn.SequentialCell(
                                    nn.Conv2d(
                                        num_inchannels[j],
                                        num_outchannels_conv3x3,
                                        kernel_size=3,
                                        stride=2,
                                        padding=1,
                                        pad_mode="pad",
                                    ),
                                    nn.BatchNorm2d(num_outchannels_conv3x3),
                                    nn.ReLU(),
                                )
                            )
                    fuse_layer.append(nn.SequentialCell(conv3x3s))
            fuse_layers.append(nn.CellList(fuse_layer))

        return nn.CellList(fuse_layers)

    def construct(self, x: List[Tensor]) -> List[Tensor]:
        if self.num_branches == 1:
            return [self.branches[0](x[0])]

        for i in range(self.num_branches):
            x[i] = self.branches[i](x[i])

        x_fuse = []

        for i in range(len(self.fuse_layers)):
            y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
            for j in range(1, self.num_branches):
                if i == j:
                    y = y + x[j]
                elif j > i:
                    _, _, height, width = x[i].shape
                    t = self.fuse_layers[i][j](x[j])
                    t = ops.ResizeNearestNeighbor((height, width))(t)
                    y = y + t
                else:
                    y = y + self.fuse_layers[i][j](x[j])
            x_fuse.append(self.relu(y))

        if not self.multi_scale_output:
            x_fuse = x_fuse[0]

        return x_fuse
mindocr.models.backbones.mindcv_models.hrnet.HRNet

Bases: nn.Cell

HRNet Backbone, based on "Deep High-Resolution Representation Learning for Visual Recognition" <https://arxiv.org/abs/1908.07919>_.

PARAMETER DESCRIPTION
stage_cfg

Configuration of the extra blocks. It accepts a dictionay storing the detail config of each block. which include num_modules, num_branches, block, num_blocks, num_channels. For detail example, please check the implementation of hrnet_w32 and hrnet_w48.

TYPE: Dict[str, Dict[str, int]]

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

Number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
class HRNet(nn.Cell):
    r"""HRNet Backbone, based on
    `"Deep High-Resolution Representation Learning for Visual Recognition"
    <https://arxiv.org/abs/1908.07919>`_.

    Args:
        stage_cfg: Configuration of the extra blocks. It accepts a dictionay
            storing the detail config of each block. which include `num_modules`,
            `num_branches`, `block`, `num_blocks`, `num_channels`. For detail example,
            please check the implementation of `hrnet_w32` and `hrnet_w48`.
        num_classes: number of classification classes. Default: 1000.
        in_channels: Number the channels of the input. Default: 3.
    """

    blocks_dict = {"BASIC": BasicBlock, "BOTTLENECK": Bottleneck}

    def __init__(
        self,
        stage_cfg: Dict[str, Dict[str, int]],
        num_classes: int = 1000,
        in_channels: int = 3,
    ) -> None:
        super().__init__()

        self.stage_cfg = stage_cfg
        # stem net
        self.conv1 = nn.Conv2d(
            in_channels, 64, kernel_size=3, stride=2, padding=1, pad_mode="pad"
        )
        self.bn1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(
            64, 64, kernel_size=3, stride=2, padding=1, pad_mode="pad"
        )
        self.bn2 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()

        # stage 1
        self.stage1_cfg = self.stage_cfg["stage1"]
        num_channels = self.stage1_cfg["num_channels"][0]
        num_blocks = self.stage1_cfg["num_blocks"][0]
        block = self.blocks_dict[self.stage1_cfg["block"]]
        self.layer1 = self._make_layer(block, 64, num_channels, num_blocks)

        # stage 2
        self.stage2_cfg = self.stage_cfg["stage2"]
        num_channels = self.stage2_cfg["num_channels"]
        block = self.blocks_dict[self.stage2_cfg["block"]]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))
        ]

        self.transition1, self.transition1_flags = self._make_transition_layer(
            [256], num_channels
        )
        self.stage2, pre_stage_channels = self._make_stage(
            self.stage2_cfg, num_channels
        )

        # stage 3
        self.stage3_cfg = self.stage_cfg["stage3"]
        num_channels = self.stage3_cfg["num_channels"]
        block = self.blocks_dict[self.stage3_cfg["block"]]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))
        ]

        self.transition2, self.transition2_flags = self._make_transition_layer(
            pre_stage_channels, num_channels
        )
        self.stage3, pre_stage_channels = self._make_stage(
            self.stage3_cfg, num_channels
        )

        # stage 4
        self.stage4_cfg = self.stage_cfg["stage4"]
        num_channels = self.stage4_cfg["num_channels"]
        block = self.blocks_dict[self.stage4_cfg["block"]]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))
        ]
        self.transition3, self.transition3_flags = self._make_transition_layer(
            pre_stage_channels, num_channels
        )
        self.stage4, pre_stage_channels = self._make_stage(
            self.stage4_cfg, num_channels
        )

        # head
        self.pool = GlobalAvgPooling()
        self.incre_modules, self.downsample_modules, self.final_layer = self._make_head(
            pre_stage_channels
        )
        self.classifier = nn.Dense(2048, num_classes)

    def _make_head(self, pre_stage_channels: List[int]):
        head_block = Bottleneck
        head_channels = [32, 64, 128, 256]

        # increase the #channesl on each resolution
        # from C, 2C, 4C, 8C to 128, 256, 512, 1024
        incre_modules = list()
        for i, channels in enumerate(pre_stage_channels):
            incre_module = self._make_layer(
                head_block, channels, head_channels[i], 1, stride=1
            )
            incre_modules.append(incre_module)
        incre_modules = nn.CellList(incre_modules)

        # downsample modules
        downsamp_modules = []
        for i in range(len(pre_stage_channels) - 1):
            in_channels = head_channels[i] * head_block.expansion
            out_channels = head_channels[i + 1] * head_block.expansion

            downsamp_module = nn.SequentialCell(
                nn.Conv2d(
                    in_channels=in_channels,
                    out_channels=out_channels,
                    kernel_size=3,
                    stride=2,
                    pad_mode="pad",
                    padding=1,
                ),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(),
            )

            downsamp_modules.append(downsamp_module)
        downsamp_modules = nn.CellList(downsamp_modules)

        final_layer = nn.SequentialCell(
            nn.Conv2d(
                in_channels=head_channels[3] * head_block.expansion,
                out_channels=2048,
                kernel_size=1,
                stride=1,
                padding=0,
            ),
            nn.BatchNorm2d(2048),
            nn.ReLU(),
        )

        return incre_modules, downsamp_modules, final_layer

    def _make_transition_layer(
        self, num_channels_pre_layer: List[int], num_channels_cur_layer: List[int]
    ) -> Tuple[nn.CellList, List[bool]]:
        num_branches_cur = len(num_channels_cur_layer)
        num_branches_pre = len(num_channels_pre_layer)

        transition_layers = []
        transition_layers_flags = []
        for i in range(num_branches_cur):
            if i < num_branches_pre:
                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
                    transition_layers.append(
                        nn.SequentialCell(
                            nn.Conv2d(
                                num_channels_pre_layer[i],
                                num_channels_cur_layer[i],
                                kernel_size=3,
                                padding=1,
                                pad_mode="pad",
                            ),
                            nn.BatchNorm2d(num_channels_cur_layer[i]),
                            nn.ReLU(),
                        )
                    )
                    transition_layers_flags.append(True)
                else:
                    transition_layers.append(IdentityCell())
                    transition_layers_flags.append(False)
            else:
                conv3x3s = []
                for j in range(i + 1 - num_branches_pre):
                    inchannels = num_channels_pre_layer[-1]
                    outchannels = (
                        num_channels_cur_layer[i]
                        if j == i - num_branches_pre
                        else inchannels
                    )
                    conv3x3s.append(
                        nn.SequentialCell(
                            [
                                nn.Conv2d(
                                    inchannels,
                                    outchannels,
                                    kernel_size=3,
                                    stride=2,
                                    padding=1,
                                    pad_mode="pad",
                                ),
                                nn.BatchNorm2d(outchannels),
                                nn.ReLU(),
                            ]
                        )
                    )
                transition_layers.append(nn.SequentialCell(conv3x3s))
                transition_layers_flags.append(True)

        return nn.CellList(transition_layers), transition_layers_flags

    def _make_layer(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        in_channels: int,
        out_channels: int,
        blocks: int,
        stride: int = 1,
    ) -> nn.SequentialCell:
        downsample = None
        if stride != 1 or in_channels != out_channels * block.expansion:
            downsample = nn.SequentialCell(
                nn.Conv2d(
                    in_channels,
                    out_channels * block.expansion,
                    kernel_size=1,
                    stride=stride,
                ),
                nn.BatchNorm2d(out_channels * block.expansion),
            )

        layers = []
        layers.append(block(in_channels, out_channels, stride, down_sample=downsample))
        for _ in range(1, blocks):
            layers.append(block(out_channels * block.expansion, out_channels))

        return nn.SequentialCell(layers)

    def _make_stage(
        self,
        layer_config: Dict[str, int],
        num_inchannels: int,
        multi_scale_output: bool = True,
    ) -> Tuple[nn.SequentialCell, List[int]]:
        num_modules = layer_config["num_modules"]
        num_branches = layer_config["num_branches"]
        num_blocks = layer_config["num_blocks"]
        num_channels = layer_config["num_channels"]
        block = self.blocks_dict[layer_config["block"]]

        modules = []
        for i in range(num_modules):
            # multi_scale_output is only used last module
            if not multi_scale_output and i == num_modules - 1:
                reset_multi_scale_output = False
            else:
                reset_multi_scale_output = True

            modules.append(
                HRModule(
                    num_branches,
                    block,
                    num_blocks,
                    num_inchannels,
                    num_channels,
                    reset_multi_scale_output,
                )
            )
            num_inchannels = modules[-1].num_inchannels

        return nn.SequentialCell(modules), num_inchannels

    def forward_features(self, x: Tensor) -> Tensor:
        """Perform the feature extraction.

        Args:
            x: Tensor

        Returns:
            Extracted feature
        """
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        # stage 1
        x = self.layer1(x)

        # stage 2
        x_list = []
        for i in range(self.stage2_cfg["num_branches"]):
            if self.transition1_flags[i]:
                x_list.append(self.transition1[i](x))
            else:
                x_list.append(x)
        y_list = self.stage2(x_list)

        # stage 3
        x_list = []
        for i in range(self.stage3_cfg["num_branches"]):
            if self.transition2_flags[i]:
                x_list.append(self.transition2[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage3(x_list)

        # stage 4
        x_list = []
        for i in range(self.stage4_cfg["num_branches"]):
            if self.transition3_flags[i]:
                x_list.append(self.transition3[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y = self.stage4(x_list)

        return y

    def forward_head(self, x: List[Tensor]) -> Tensor:
        y = self.incre_modules[0](x[0])
        for i in range(len(self.downsample_modules)):
            y = self.incre_modules[i + 1](x[i + 1]) + self.downsample_modules[i](y)

        y = self.final_layer(y)
        y = self.pool(y)
        y = self.classifier(y)
        return y

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.hrnet.HRNet.forward_features(x)

Perform the feature extraction.

PARAMETER DESCRIPTION
x

Tensor

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

Extracted feature

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
def forward_features(self, x: Tensor) -> Tensor:
    """Perform the feature extraction.

    Args:
        x: Tensor

    Returns:
        Extracted feature
    """
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.conv2(x)
    x = self.bn2(x)
    x = self.relu(x)

    # stage 1
    x = self.layer1(x)

    # stage 2
    x_list = []
    for i in range(self.stage2_cfg["num_branches"]):
        if self.transition1_flags[i]:
            x_list.append(self.transition1[i](x))
        else:
            x_list.append(x)
    y_list = self.stage2(x_list)

    # stage 3
    x_list = []
    for i in range(self.stage3_cfg["num_branches"]):
        if self.transition2_flags[i]:
            x_list.append(self.transition2[i](y_list[-1]))
        else:
            x_list.append(y_list[i])
    y_list = self.stage3(x_list)

    # stage 4
    x_list = []
    for i in range(self.stage4_cfg["num_branches"]):
        if self.transition3_flags[i]:
            x_list.append(self.transition3[i](y_list[-1]))
        else:
            x_list.append(y_list[i])
    y = self.stage4(x_list)

    return y
mindocr.models.backbones.mindcv_models.hrnet.IdentityCell

Bases: nn.Cell

Identity Cell

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
34
35
36
37
38
39
40
41
class IdentityCell(nn.Cell):
    """Identity Cell"""

    def __init__(self) -> None:
        super().__init__()

    def construct(self, x: Any) -> Any:
        return x
mindocr.models.backbones.mindcv_models.hrnet.hrnet_w32(pretrained=False, num_classes=1000, in_channels=3)

Get HRNet with width=32 model. Refer to the base class models.HRNet for more details.

PARAMETER DESCRIPTION
pretrained

Whether the model is pretrained. Default: False

TYPE: bool DEFAULT: False

num_classes

number of classification classes. Default: 1000

TYPE: int DEFAULT: 1000

in_channels

Number of input channels. Default: 3

TYPE: int DEFAULT: 3

RETURNS DESCRIPTION
HRNet

HRNet model

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
@register_model
def hrnet_w32(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3
) -> HRNet:
    """Get HRNet with width=32 model.
    Refer to the base class `models.HRNet` for more details.

    Args:
        pretrained: Whether the model is pretrained. Default: False
        num_classes: number of classification classes. Default: 1000
        in_channels: Number of input channels. Default: 3

    Returns:
        HRNet model
    """
    default_cfg = default_cfgs["hrnet_w32"]
    stage_cfg = dict(
        stage1=dict(
            num_modules=1,
            num_branches=1,
            block="BOTTLENECK",
            num_blocks=[4],
            num_channels=[64],
        ),
        stage2=dict(
            num_modules=1,
            num_branches=2,
            block="BASIC",
            num_blocks=[4, 4],
            num_channels=[32, 64],
        ),
        stage3=dict(
            num_modules=4,
            num_branches=3,
            block="BASIC",
            num_blocks=[4, 4, 4],
            num_channels=[32, 64, 128],
        ),
        stage4=dict(
            num_modules=3,
            num_branches=4,
            block="BASIC",
            num_blocks=[4, 4, 4, 4],
            num_channels=[32, 64, 128, 256],
        ),
    )
    model = HRNet(stage_cfg, num_classes=num_classes, in_channels=in_channels)
    if pretrained:
        load_pretrained(
            model, default_cfg, num_classes=num_classes, in_channels=in_channels
        )

    return model
mindocr.models.backbones.mindcv_models.hrnet.hrnet_w48(pretrained=False, num_classes=1000, in_channels=3)

Get HRNet with width=48 model. Refer to the base class models.HRNet for more details.

PARAMETER DESCRIPTION
pretrained

Whether the model is pretrained. Default: False

TYPE: bool DEFAULT: False

num_classes

number of classification classes. Default: 1000

TYPE: int DEFAULT: 1000

in_channels

Number of input channels. Default: 3

TYPE: int DEFAULT: 3

RETURNS DESCRIPTION
HRNet

HRNet model

Source code in mindocr\models\backbones\mindcv_models\hrnet.py
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
@register_model
def hrnet_w48(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3
) -> HRNet:
    """Get HRNet with width=48 model.
    Refer to the base class `models.HRNet` for more details.

    Args:
        pretrained: Whether the model is pretrained. Default: False
        num_classes: number of classification classes. Default: 1000
        in_channels: Number of input channels. Default: 3

    Returns:
        HRNet model
    """
    default_cfg = default_cfgs["hrnet_w48"]
    stage_cfg = dict(
        stage1=dict(
            num_modules=1,
            num_branches=1,
            block="BOTTLENECK",
            num_blocks=[4],
            num_channels=[64],
        ),
        stage2=dict(
            num_modules=1,
            num_branches=2,
            block="BASIC",
            num_blocks=[4, 4],
            num_channels=[48, 96],
        ),
        stage3=dict(
            num_modules=4,
            num_branches=3,
            block="BASIC",
            num_blocks=[4, 4, 4],
            num_channels=[48, 96, 192],
        ),
        stage4=dict(
            num_modules=3,
            num_branches=4,
            block="BASIC",
            num_blocks=[4, 4, 4, 4],
            num_channels=[48, 96, 192, 384],
        ),
    )
    model = HRNet(stage_cfg, num_classes=num_classes, in_channels=in_channels)
    if pretrained:
        load_pretrained(
            model, default_cfg, num_classes=num_classes, in_channels=in_channels
        )

    return model
mindocr.models.backbones.mindcv_models.layers

layers init

mindocr.models.backbones.mindcv_models.layers.activation

Custom operators.

mindocr.models.backbones.mindcv_models.layers.activation.Swish

Bases: nn.Cell

Swish activation function: x * sigmoid(x).

Return

Tensor

Example

x = Tensor(((20, 16), (50, 50)), mindspore.float32) Swish()(x)

Source code in mindocr\models\backbones\mindcv_models\layers\activation.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class Swish(nn.Cell):
    """
    Swish activation function: x * sigmoid(x).

    Args:
        None

    Return:
        Tensor

    Example:
        >>> x = Tensor(((20, 16), (50, 50)), mindspore.float32)
        >>> Swish()(x)
    """

    def __init__(self):
        super().__init__()
        self.result = None
        self.sigmoid = nn.Sigmoid()

    def construct(self, x):
        result = x * self.sigmoid(x)
        return result
mindocr.models.backbones.mindcv_models.layers.conv_norm_act

Conv2d + BN + Act

mindocr.models.backbones.mindcv_models.layers.conv_norm_act.Conv2dNormActivation

Bases: nn.Cell

Conv2d + BN + Act

Source code in mindocr\models\backbones\mindcv_models\layers\conv_norm_act.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class Conv2dNormActivation(nn.Cell):
    """Conv2d + BN + Act"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int = 3,
        stride: int = 1,
        pad_mode: str = "pad",
        padding: Optional[int] = None,
        dilation: int = 1,
        groups: int = 1,
        norm: Optional[nn.Cell] = nn.BatchNorm2d,
        activation: Optional[nn.Cell] = nn.ReLU,
        has_bias: Optional[bool] = None,
        **kwargs
    ) -> None:
        super().__init__()

        if pad_mode == "pad":
            if padding is None:
                padding = ((stride - 1) + dilation * (kernel_size - 1)) // 2
        else:
            padding = 0

        if has_bias is None:
            has_bias = norm is None

        layers = [
            nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size,
                stride,
                pad_mode=pad_mode,
                padding=padding,
                dilation=dilation,
                group=groups,
                has_bias=has_bias,
                **kwargs
            )
        ]

        if norm:
            layers.append(norm(out_channels))
        if activation:
            layers.append(activation())

        self.features = nn.SequentialCell(layers)

    def construct(self, x):
        output = self.features(x)
        return output
mindocr.models.backbones.mindcv_models.layers.drop_path

DropPath Mindspore implementations of DropPath (Stochastic Depth) regularization layers. Papers: Deep Networks with Stochastic Depth (https://arxiv.org/abs/1603.09382)

mindocr.models.backbones.mindcv_models.layers.drop_path.DropPath

Bases: nn.Cell

DropPath (Stochastic Depth) regularization layers

Source code in mindocr\models\backbones\mindcv_models\layers\drop_path.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class DropPath(nn.Cell):
    """DropPath (Stochastic Depth) regularization layers"""

    def __init__(
        self,
        drop_prob: float = 0.0,
        scale_by_keep: bool = True,
    ) -> None:
        super().__init__()
        self.keep_prob = 1.0 - drop_prob
        self.scale_by_keep = scale_by_keep
        self.dropout = nn.Dropout(self.keep_prob)

    def construct(self, x: Tensor) -> Tensor:
        if self.keep_prob == 1.0 or not self.training:
            return x
        shape = (x.shape[0],) + (1,) * (x.ndim - 1)
        random_tensor = self.dropout(ones(shape))
        if not self.scale_by_keep:
            random_tensor = ops.mul(random_tensor, self.keep_prob)
        return x * random_tensor
mindocr.models.backbones.mindcv_models.layers.helpers

Layer/Module Helpers

mindocr.models.backbones.mindcv_models.layers.identity

Identity Module

mindocr.models.backbones.mindcv_models.layers.identity.Identity

Bases: nn.Cell

Identity

Source code in mindocr\models\backbones\mindcv_models\layers\identity.py
5
6
7
8
9
class Identity(nn.Cell):
    """Identity"""

    def construct(self, x):
        return x
mindocr.models.backbones.mindcv_models.layers.mlp

MLP module w/ dropout and configurable activation layer

mindocr.models.backbones.mindcv_models.layers.patch_embed

Image to Patch Embedding using Conv2d A convolution based approach to patchifying a 2D image w/ embedding projection.

mindocr.models.backbones.mindcv_models.layers.patch_embed.PatchEmbed

Bases: nn.Cell

Image to Patch Embedding

PARAMETER DESCRIPTION
image_size

Image size. Default: 224.

TYPE: int DEFAULT: 224

patch_size

Patch token size. Default: 4.

TYPE: int DEFAULT: 4

in_chans

Number of input image channels. Default: 3.

TYPE: int DEFAULT: 3

embed_dim

Number of linear projection output channels. Default: 96.

TYPE: int DEFAULT: 96

norm_layer

Normalization layer. Default: None

TYPE: nn.Cell DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\layers\patch_embed.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class PatchEmbed(nn.Cell):
    """Image to Patch Embedding

    Args:
        image_size (int): Image size.  Default: 224.
        patch_size (int): Patch token size. Default: 4.
        in_chans (int): Number of input image channels. Default: 3.
        embed_dim (int): Number of linear projection output channels. Default: 96.
        norm_layer (nn.Cell, optional): Normalization layer. Default: None
    """

    def __init__(
        self,
        image_size: int = 224,
        patch_size: int = 4,
        in_chans: int = 3,
        embed_dim: int = 96,
        norm_layer: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        image_size = to_2tuple(image_size)
        patch_size = to_2tuple(patch_size)
        patches_resolution = [image_size[0] // patch_size[0], image_size[1] // patch_size[1]]
        self.image_size = image_size
        self.patch_size = patch_size
        self.patches_resolution = patches_resolution
        self.num_patches = patches_resolution[0] * patches_resolution[1]

        self.in_chans = in_chans
        self.embed_dim = embed_dim

        self.proj = nn.Conv2d(in_channels=in_chans, out_channels=embed_dim, kernel_size=patch_size, stride=patch_size,
                              pad_mode='pad', has_bias=True, weight_init="TruncatedNormal")

        if norm_layer is not None:
            if isinstance(embed_dim, int):
                embed_dim = (embed_dim,)
            self.norm = norm_layer(embed_dim, epsilon=1e-5)
        else:
            self.norm = None

    def construct(self, x: Tensor) -> Tensor:
        """docstring"""
        B = x.shape[0]
        # FIXME look at relaxing size constraints
        x = ops.Reshape()(self.proj(x), (B, self.embed_dim, -1))  # B Ph*Pw C
        x = ops.Transpose()(x, (0, 2, 1))

        if self.norm is not None:
            x = self.norm(x)
        return x
mindocr.models.backbones.mindcv_models.layers.patch_embed.PatchEmbed.construct(x)

docstring

Source code in mindocr\models\backbones\mindcv_models\layers\patch_embed.py
51
52
53
54
55
56
57
58
59
60
def construct(self, x: Tensor) -> Tensor:
    """docstring"""
    B = x.shape[0]
    # FIXME look at relaxing size constraints
    x = ops.Reshape()(self.proj(x), (B, self.embed_dim, -1))  # B Ph*Pw C
    x = ops.Transpose()(x, (0, 2, 1))

    if self.norm is not None:
        x = self.norm(x)
    return x
mindocr.models.backbones.mindcv_models.layers.pooling

GlobalAvgPooling Module

mindocr.models.backbones.mindcv_models.layers.pooling.GlobalAvgPooling

Bases: nn.Cell

GlobalAvgPooling, same as torch.nn.AdaptiveAvgPool2d when output shape is 1

Source code in mindocr\models\backbones\mindcv_models\layers\pooling.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class GlobalAvgPooling(nn.Cell):
    """
    GlobalAvgPooling, same as torch.nn.AdaptiveAvgPool2d when output shape is 1
    """

    def __init__(self, keep_dims: bool = False) -> None:
        super().__init__()
        self.keep_dims = keep_dims

    def construct(self, x):
        x = ops.mean(x, axis=(2, 3), keep_dims=self.keep_dims)
        return x
mindocr.models.backbones.mindcv_models.layers.selective_kernel

Selective Kernel Convolution/Attention Paper: Selective Kernel Networks (https://arxiv.org/abs/1903.06586)

mindocr.models.backbones.mindcv_models.layers.selective_kernel.SelectiveKernel

Bases: nn.Cell

Selective Kernel Convolution Module As described in Selective Kernel Networks (https://arxiv.org/abs/1903.06586) with some modifications. Largest change is the input split, which divides the input channels across each convolution path, this can be viewed as a grouping of sorts, but the output channel counts expand to the module level value. This keeps the parameter count from ballooning when the convolutions themselves don't have groups, but still provides a noteworthy increase in performance over similar param count models without this attention layer. -Ross W

PARAMETER DESCRIPTION
in_channels

module input (feature) channel count

TYPE: int

out_channels

module output (feature) channel count

TYPE: int DEFAULT: None

kernel_size

kernel size for each convolution branch

TYPE: int, list DEFAULT: None

stride

stride for convolutions

TYPE: int DEFAULT: 1

dilation

dilation for module as a whole, impacts dilation of each branch

TYPE: int DEFAULT: 1

groups

number of groups for each branch

TYPE: int DEFAULT: 1

rd_ratio

reduction factor for attention features

TYPE: int, float DEFAULT: 1.0 / 16

rd_channels(int)

reduction channels can be specified directly by arg (if rd_channels is set)

rd_divisor(int)

divisor can be specified to keep channels

keep_3x3

keep all branch convolution kernels as 3x3, changing larger kernels for dilations

TYPE: bool DEFAULT: True

split_input

split input channels evenly across each convolution branch, keeps param count lower, can be viewed as grouping by path, output expands to module out_channels count

TYPE: bool DEFAULT: True

activation

activation layer to use

TYPE: nn.Module DEFAULT: nn.ReLU

norm

batchnorm/norm layer to use

TYPE: nn.Module DEFAULT: nn.BatchNorm2d

Source code in mindocr\models\backbones\mindcv_models\layers\selective_kernel.py
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
class SelectiveKernel(nn.Cell):
    """Selective Kernel Convolution Module
    As described in Selective Kernel Networks (https://arxiv.org/abs/1903.06586) with some modifications.
    Largest change is the input split, which divides the input channels across each convolution path, this can
    be viewed as a grouping of sorts, but the output channel counts expand to the module level value. This keeps
    the parameter count from ballooning when the convolutions themselves don't have groups, but still provides
    a noteworthy increase in performance over similar param count models without this attention layer. -Ross W
    Args:
        in_channels (int):  module input (feature) channel count
        out_channels (int):  module output (feature) channel count
        kernel_size (int, list): kernel size for each convolution branch
        stride (int): stride for convolutions
        dilation (int): dilation for module as a whole, impacts dilation of each branch
        groups (int): number of groups for each branch
        rd_ratio (int, float): reduction factor for attention features
        rd_channels(int): reduction channels can be specified directly by arg (if rd_channels is set)
        rd_divisor(int): divisor can be specified to keep channels
        keep_3x3 (bool): keep all branch convolution kernels as 3x3, changing larger kernels for dilations
        split_input (bool): split input channels evenly across each convolution branch, keeps param count lower,
            can be viewed as grouping by path, output expands to module out_channels count
        activation (nn.Module): activation layer to use
        norm (nn.Module): batchnorm/norm layer to use
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: Optional[int] = None,
        kernel_size: Optional[Union[int, List]] = None,
        stride: int = 1,
        dilation: int = 1,
        groups: int = 1,
        rd_ratio: float = 1.0 / 16,
        rd_channels: Optional[int] = None,
        rd_divisor: int = 8,
        keep_3x3: bool = True,
        split_input: bool = True,
        activation: Optional[nn.Cell] = nn.ReLU,
        norm: Optional[nn.Cell] = nn.BatchNorm2d,
    ):
        super().__init__()
        out_channels = out_channels or in_channels
        kernel_size = kernel_size or [3, 5]  # default to one 3x3 and one 5x5 branch. 5x5 -> 3x3 + dilation
        _kernel_valid(kernel_size)
        if not isinstance(kernel_size, list):
            kernel_size = [kernel_size] * 2
        if keep_3x3:
            dilation = [dilation * (k - 1) // 2 for k in kernel_size]
            kernel_size = [3] * len(kernel_size)
        else:
            dilation = [dilation] * len(kernel_size)
        self.num_paths = len(kernel_size)
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.split_input = split_input
        if self.split_input:
            assert in_channels % self.num_paths == 0
            in_channels = in_channels // self.num_paths
        groups = min(out_channels, groups)

        self.paths = nn.CellList([
            Conv2dNormActivation(in_channels, out_channels, kernel_size=k, stride=stride, groups=groups,
                                 dilation=d, activation=activation, norm=norm)
            for k, d in zip(kernel_size, dilation)
        ])

        attn_channels = rd_channels or make_divisible(out_channels * rd_ratio, divisor=rd_divisor)
        self.attn = SelectiveKernelAttn(out_channels, self.num_paths, attn_channels)

    def construct(self, x: Tensor) -> Tensor:
        x_paths = []
        if self.split_input:
            x_split = ops.split(x, axis=1, output_num=self.num_paths)
            for i, op in enumerate(self.paths):
                x_paths.append(op(x_split[i]))
        else:
            for op in self.paths:
                x_paths.append(op(x))

        x = ops.stack(x_paths, axis=1)
        x_attn = self.attn(x)
        x = x * x_attn
        x = x.sum(1)
        return x
mindocr.models.backbones.mindcv_models.layers.selective_kernel.SelectiveKernelAttn

Bases: nn.Cell

Selective Kernel Attention Module Selective Kernel attention mechanism factored out into its own module.

Source code in mindocr\models\backbones\mindcv_models\layers\selective_kernel.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class SelectiveKernelAttn(nn.Cell):
    """Selective Kernel Attention Module
    Selective Kernel attention mechanism factored out into its own module.
    """

    def __init__(
        self,
        channels: int,
        num_paths: int = 2,
        attn_channels: int = 32,
        activation: Optional[nn.Cell] = nn.ReLU,
        norm: Optional[nn.Cell] = nn.BatchNorm2d,
    ):
        super().__init__()
        self.num_paths = num_paths
        self.mean = GlobalAvgPooling(keep_dims=True)
        self.fc_reduce = nn.Conv2d(channels, attn_channels, kernel_size=1, has_bias=False)
        self.bn = norm(attn_channels)
        self.act = activation()
        self.fc_select = nn.Conv2d(attn_channels, channels * num_paths, kernel_size=1)
        self.softmax = nn.Softmax(axis=1)

    def construct(self, x: Tensor) -> Tensor:
        x = self.mean((x.sum(1)))
        x = self.fc_reduce(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.fc_select(x)
        b, c, h, w = x.shape
        x = x.reshape((b, self.num_paths, c // self.num_paths, h, w))
        x = self.softmax(x)
        return x
mindocr.models.backbones.mindcv_models.layers.squeeze_excite

Squeeze-and-Excitation Channel Attention An SE implementation originally based on PyTorch SE-Net impl. Has since evolved with additional functionality / configuration. Paper: Squeeze-and-Excitation Networks - https://arxiv.org/abs/1709.01507

mindocr.models.backbones.mindcv_models.layers.squeeze_excite.SqueezeExcite

Bases: nn.Cell

SqueezeExcite Module as defined in original SE-Nets with a few additions.

Additions include
  • divisor can be specified to keep channels % div == 0 (default: 8)
  • reduction channels can be specified directly by arg (if rd_channels is set)
  • reduction channels can be specified by float rd_ratio (default: 1/16)
  • customizable activation, normalization, and gate layer
Source code in mindocr\models\backbones\mindcv_models\layers\squeeze_excite.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class SqueezeExcite(nn.Cell):
    """SqueezeExcite Module as defined in original SE-Nets with a few additions.
    Additions include:
        * divisor can be specified to keep channels % div == 0 (default: 8)
        * reduction channels can be specified directly by arg (if rd_channels is set)
        * reduction channels can be specified by float rd_ratio (default: 1/16)
        * customizable activation, normalization, and gate layer
    """

    def __init__(
        self,
        in_channels: int,
        rd_ratio: float = 1.0 / 16,
        rd_channels: Optional[int] = None,
        rd_divisor: int = 8,
        norm: Optional[nn.Cell] = None,
        act_layer: nn.Cell = nn.ReLU,
        gate_layer: nn.Cell = nn.Sigmoid,
    ) -> None:
        super().__init__()
        self.norm = norm
        self.act = act_layer()
        self.gate = gate_layer()
        if not rd_channels:
            rd_channels = make_divisible(in_channels * rd_ratio, rd_divisor)

        self.conv_reduce = nn.Conv2d(
            in_channels=in_channels,
            out_channels=rd_channels,
            kernel_size=1,
            has_bias=True,
        )
        if self.norm:
            self.bn = nn.BatchNorm2d(rd_channels)
        self.conv_expand = nn.Conv2d(
            in_channels=rd_channels,
            out_channels=in_channels,
            kernel_size=1,
            has_bias=True,
        )
        self.pool = GlobalAvgPooling(keep_dims=True)

    def construct(self, x: Tensor) -> Tensor:
        x_se = self.pool(x)
        x_se = self.conv_reduce(x_se)
        if self.norm:
            x_se = self.bn(x_se)
        x_se = self.act(x_se)
        x_se = self.conv_expand(x_se)
        x_se = self.gate(x_se)
        x = x * x_se
        return x
mindocr.models.backbones.mindcv_models.layers.squeeze_excite.SqueezeExciteV2

Bases: nn.Cell

SqueezeExcite Module as defined in original SE-Nets with a few additions. V1 uses 1x1conv to replace fc layers, and V2 uses nn.Dense to implement directly.

Source code in mindocr\models\backbones\mindcv_models\layers\squeeze_excite.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
class SqueezeExciteV2(nn.Cell):
    """SqueezeExcite Module as defined in original SE-Nets with a few additions.
    V1 uses 1x1conv to replace fc layers, and V2 uses nn.Dense to implement directly.
    """

    def __init__(
        self,
        in_channels: int,
        rd_ratio: float = 1.0 / 16,
        rd_channels: Optional[int] = None,
        rd_divisor: int = 8,
        norm: Optional[nn.Cell] = None,
        act_layer: nn.Cell = nn.ReLU,
        gate_layer: nn.Cell = nn.Sigmoid,
    ) -> None:
        super().__init__()
        self.norm = norm
        self.act = act_layer()
        self.gate = gate_layer()
        if not rd_channels:
            rd_channels = make_divisible(in_channels * rd_ratio, rd_divisor)

        self.conv_reduce = nn.Dense(
            in_channels=in_channels,
            out_channels=rd_channels,
            has_bias=True,
        )
        if self.norm:
            self.bn = nn.BatchNorm2d(rd_channels)
        self.conv_expand = nn.Dense(
            in_channels=rd_channels,
            out_channels=in_channels,
            has_bias=True,
        )
        self.pool = GlobalAvgPooling(keep_dims=False)

    def construct(self, x: Tensor) -> Tensor:
        x_se = self.pool(x)
        x_se = self.conv_reduce(x_se)
        if self.norm:
            x_se = self.bn(x_se)
        x_se = self.act(x_se)
        x_se = self.conv_expand(x_se)
        x_se = self.gate(x_se)
        x_se = ops.expand_dims(x_se, -1)
        x_se = ops.expand_dims(x_se, -1)
        x = x * x_se
        return x
mindocr.models.backbones.mindcv_models.mixnet

MindSpore implementation of MixNet. Refer to MixConv: Mixed Depthwise Convolutional Kernels

mindocr.models.backbones.mindcv_models.mixnet.MDConv

Bases: nn.Cell

Mixed Depth-wise Convolution

Source code in mindocr\models\backbones\mindcv_models\mixnet.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
class MDConv(nn.Cell):
    """Mixed Depth-wise Convolution"""

    def __init__(self, channels: int, kernel_size: list, stride: int) -> None:
        super(MDConv, self).__init__()
        self.num_groups = len(kernel_size)

        if self.num_groups == 1:
            self.mixed_depthwise_conv = nn.Conv2d(
                channels,
                channels,
                kernel_size[0],
                stride=stride,
                pad_mode="pad",
                padding=kernel_size[0] // 2,
                group=channels,
                has_bias=False
            )
        else:
            self.split_channels = _splitchannels(channels, self.num_groups)

            self.mixed_depthwise_conv = nn.CellList()
            for i in range(self.num_groups):
                self.mixed_depthwise_conv.append(nn.Conv2d(
                    self.split_channels[i],
                    self.split_channels[i],
                    kernel_size[i],
                    stride=stride,
                    pad_mode="pad",
                    padding=kernel_size[i] // 2,
                    group=self.split_channels[i],
                    has_bias=False
                ))

    def construct(self, x: Tensor) -> Tensor:
        if self.num_groups == 1:
            return self.mixed_depthwise_conv(x)

        output = []
        start, end = 0, 0
        for i in range(self.num_groups):
            start, end = end, end + self.split_channels[i]
            x_split = x[:, start:end]

            conv = self.mixed_depthwise_conv[i]
            output.append(conv(x_split))

        return ops.concat(output, axis=1)
mindocr.models.backbones.mindcv_models.mixnet.MixNet

Bases: nn.Cell

MixNet model class, based on "MixConv: Mixed Depthwise Convolutional Kernels" <https://arxiv.org/abs/1907.09595>_

PARAMETER DESCRIPTION
arch

size of the architecture. "small", "medium" or "large". Default: "small".

TYPE: str DEFAULT: 'small'

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number of the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

feature_size

numbet of the channels of the output features. Default: 1536.

TYPE: int DEFAULT: 1536

drop_rate

rate of dropout for classifier. Default: 0.2.

TYPE: float DEFAULT: 0.2

depth_multiplier

expansion coefficient of channels. Default: 1.0.

TYPE: float DEFAULT: 1.0

Source code in mindocr\models\backbones\mindcv_models\mixnet.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
class MixNet(nn.Cell):
    r"""MixNet model class, based on
    `"MixConv: Mixed Depthwise Convolutional Kernels" <https://arxiv.org/abs/1907.09595>`_

    Args:
        arch: size of the architecture. "small", "medium" or "large". Default: "small".
        num_classes: number of classification classes. Default: 1000.
        in_channels: number of the channels of the input. Default: 3.
        feature_size: numbet of the channels of the output features. Default: 1536.
        drop_rate: rate of dropout for classifier. Default: 0.2.
        depth_multiplier: expansion coefficient of channels. Default: 1.0.
    """

    def __init__(
        self,
        arch: str = "small",
        num_classes: int = 1000,
        in_channels: int = 3,
        feature_size: int = 1536,
        drop_rate: float = 0.2,
        depth_multiplier: float = 1.0
    ) -> None:
        super(MixNet, self).__init__()
        if arch == "small":
            block_configs = [
                [16, 16, [3], [1], [1], 1, 1, "ReLU", 0.0],
                [16, 24, [3], [1, 1], [1, 1], 2, 6, "ReLU", 0.0],
                [24, 24, [3], [1, 1], [1, 1], 1, 3, "ReLU", 0.0],
                [24, 40, [3, 5, 7], [1], [1], 2, 6, "Swish", 0.5],
                [40, 40, [3, 5], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [40, 40, [3, 5], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [40, 40, [3, 5], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [40, 80, [3, 5, 7], [1], [1, 1], 2, 6, "Swish", 0.25],
                [80, 80, [3, 5], [1], [1, 1], 1, 6, "Swish", 0.25],
                [80, 80, [3, 5], [1], [1, 1], 1, 6, "Swish", 0.25],
                [80, 120, [3, 5, 7], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [120, 120, [3, 5, 7, 9], [1, 1], [1, 1], 1, 3, "Swish", 0.5],
                [120, 120, [3, 5, 7, 9], [1, 1], [1, 1], 1, 3, "Swish", 0.5],
                [120, 200, [3, 5, 7, 9, 11], [1], [1], 2, 6, "Swish", 0.5],
                [200, 200, [3, 5, 7, 9], [1], [1, 1], 1, 6, "Swish", 0.5],
                [200, 200, [3, 5, 7, 9], [1], [1, 1], 1, 6, "Swish", 0.5]
            ]
            stem_channels = 16
            drop_rate = drop_rate
        else:
            block_configs = [
                [24, 24, [3], [1], [1], 1, 1, "ReLU", 0.0],
                [24, 32, [3, 5, 7], [1, 1], [1, 1], 2, 6, "ReLU", 0.0],
                [32, 32, [3], [1, 1], [1, 1], 1, 3, "ReLU", 0.0],
                [32, 40, [3, 5, 7, 9], [1], [1], 2, 6, "Swish", 0.5],
                [40, 40, [3, 5], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [40, 40, [3, 5], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [40, 40, [3, 5], [1, 1], [1, 1], 1, 6, "Swish", 0.5],
                [40, 80, [3, 5, 7], [1], [1], 2, 6, "Swish", 0.25],
                [80, 80, [3, 5, 7, 9], [1, 1], [1, 1], 1, 6, "Swish", 0.25],
                [80, 80, [3, 5, 7, 9], [1, 1], [1, 1], 1, 6, "Swish", 0.25],
                [80, 80, [3, 5, 7, 9], [1, 1], [1, 1], 1, 6, "Swish", 0.25],
                [80, 120, [3], [1], [1], 1, 6, "Swish", 0.5],
                [120, 120, [3, 5, 7, 9], [1, 1], [1, 1], 1, 3, "Swish", 0.5],
                [120, 120, [3, 5, 7, 9], [1, 1], [1, 1], 1, 3, "Swish", 0.5],
                [120, 120, [3, 5, 7, 9], [1, 1], [1, 1], 1, 3, "Swish", 0.5],
                [120, 200, [3, 5, 7, 9], [1], [1], 2, 6, "Swish", 0.5],
                [200, 200, [3, 5, 7, 9], [1], [1, 1], 1, 6, "Swish", 0.5],
                [200, 200, [3, 5, 7, 9], [1], [1, 1], 1, 6, "Swish", 0.5],
                [200, 200, [3, 5, 7, 9], [1], [1, 1], 1, 6, "Swish", 0.5]
            ]
            if arch == "medium":
                stem_channels = 24
                drop_rate = drop_rate
            elif arch == "large":
                stem_channels = 24
                depth_multiplier *= 1.3
                drop_rate = drop_rate
            else:
                raise ValueError(f"Unsupported model type {arch}")

        if depth_multiplier != 1.0:
            stem_channels = _roundchannels(stem_channels * depth_multiplier)

            for i, conf in enumerate(block_configs):
                conf_ls = list(conf)
                conf_ls[0] = _roundchannels(conf_ls[0] * depth_multiplier)
                conf_ls[1] = _roundchannels(conf_ls[1] * depth_multiplier)
                block_configs[i] = tuple(conf_ls)

        # stem convolution
        self.stem_conv = nn.SequentialCell([
            nn.Conv2d(in_channels, stem_channels, 3, stride=2, pad_mode="pad", padding=1),
            nn.BatchNorm2d(stem_channels),
            nn.ReLU()
        ])

        # building MixNet blocks
        layers = []
        for inc, outc, k, ek, pk, s, er, ac, se in block_configs:
            layers.append(MixNetBlock(
                inc,
                outc,
                kernel_size=k,
                expand_ksize=ek,
                project_ksize=pk,
                stride=s,
                expand_ratio=er,
                activation=ac,
                se_ratio=se
            ))
        self.layers = nn.SequentialCell(layers)

        # head
        self.head_conv = nn.SequentialCell([
            nn.Conv2d(block_configs[-1][1], feature_size, 1, pad_mode="pad", padding=0),
            nn.BatchNorm2d(feature_size),
            nn.ReLU()
        ])

        self.pool = GlobalAvgPooling()
        self.dropout = nn.Dropout(keep_prob=1 - drop_rate)
        self.classifier = nn.Dense(feature_size, num_classes)

        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                fan_out = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                cell.weight.set_data(
                    init.initializer(init.Normal(math.sqrt(2.0 / fan_out)),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Uniform(1.0 / math.sqrt(cell.weight.shape[0])),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.stem_conv(x)
        x = self.layers(x)
        x = self.head_conv(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.dropout(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.mixnet.MixNetBlock

Bases: nn.Cell

Basic Block of MixNet

Source code in mindocr\models\backbones\mindcv_models\mixnet.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
class MixNetBlock(nn.Cell):
    """Basic Block of MixNet"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: list = [3],
        expand_ksize: list = [1],
        project_ksize: list = [1],
        stride: int = 1,
        expand_ratio: int = 1,
        activation: str = "ReLU",
        se_ratio: float = 0.0,
    ) -> None:
        super(MixNetBlock, self).__init__()
        assert activation in ["ReLU", "Swish"]
        self.activation = Swish if activation == "Swish" else nn.ReLU

        expand_channels = in_channels * expand_ratio
        self.residual_connection = (stride == 1 and in_channels == out_channels)

        conv = []
        if expand_ratio != 1:
            # expand
            conv.extend([
                GroupedConv2d(in_channels, expand_channels, expand_ksize),
                nn.BatchNorm2d(expand_channels),
                self.activation()
            ])

        # depthwise
        conv.extend([
            MDConv(expand_channels, kernel_size, stride),
            nn.BatchNorm2d(expand_channels),
            self.activation()
        ])

        if se_ratio > 0:
            squeeze_channels = int(in_channels * se_ratio)
            squeeze_excite = SqueezeExcite(expand_channels, rd_channels=squeeze_channels)
            conv.append(squeeze_excite)

        # projection phase
        conv.extend([
            GroupedConv2d(expand_channels, out_channels, project_ksize),
            nn.BatchNorm2d(out_channels)
        ])

        self.convs = nn.SequentialCell(conv)

    def construct(self, x: Tensor) -> Tensor:
        if self.residual_connection:
            return x + self.convs(x)
        else:
            return self.convs(x)
mindocr.models.backbones.mindcv_models.mlpmixer

MindSpore implementation of MLP-Mixer. Refer to MLP-Mixer: An all-MLP Architecture for Vision.

mindocr.models.backbones.mindcv_models.mlpmixer.FeedForward

Bases: nn.Cell

Feed Forward Block. MLP Layer. FC -> GELU -> FC

Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class FeedForward(nn.Cell):
    """Feed Forward Block. MLP Layer. FC -> GELU -> FC"""

    def __init__(self, dim, hidden_dim, dropout=0.):
        super(FeedForward, self).__init__()
        self.net = nn.SequentialCell(
            nn.Dense(dim, hidden_dim),
            nn.GELU(),
            nn.Dropout(keep_prob=1 - dropout),
            nn.Dense(hidden_dim, dim),
            nn.Dropout(keep_prob=1 - dropout)
        )

    def construct(self, x):
        return self.net(x)
mindocr.models.backbones.mindcv_models.mlpmixer.MLPMixer

Bases: nn.Cell

MLP-Mixer model class, based on "MLP-Mixer: An all-MLP Architecture for Vision" <https://arxiv.org/abs/2105.01601>_

PARAMETER DESCRIPTION
depth

number of MixerBlocks.

TYPE: int)

patch_size

size of a single image patch.

TYPE: int or tuple)

n_patches

number of patches.

TYPE: int)

n_channels

channels(dimension) of a single embedded patch.

TYPE: int)

token_dim

hidden dim of token-mixing MLP.

TYPE: int)

channel_dim

hidden dim of channel-mixing MLP.

TYPE: int)

n_classes

number of classification classes.

TYPE: int) DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
class MLPMixer(nn.Cell):
    r"""MLP-Mixer model class, based on
    `"MLP-Mixer: An all-MLP Architecture for Vision" <https://arxiv.org/abs/2105.01601>`_

    Args:
        depth (int) : number of MixerBlocks.
        patch_size (int or tuple) : size of a single image patch.
        n_patches (int) : number of patches.
        n_channels (int) : channels(dimension) of a single embedded patch.
        token_dim (int) : hidden dim of token-mixing MLP.
        channel_dim (int) : hidden dim of channel-mixing MLP.
        n_classes (int) : number of classification classes.
    """

    def __init__(self, depth, patch_size, n_patches, n_channels, token_dim, channel_dim, n_classes=1000):
        super().__init__()
        self.n_patches = n_patches
        self.n_channels = n_channels
        # patch with shape of (3, patch_size, patch_size) is embedded to n_channels dim feature.
        self.to_patch_embedding = nn.SequentialCell(
            nn.Conv2d(3, n_channels, patch_size, patch_size, pad_mode="pad", padding=0),
            TransPose(permutation=(0, 2, 1), embedding=True),
        )
        self.mixer_blocks = nn.SequentialCell()
        for _ in range(depth):
            self.mixer_blocks.append(MixerBlock(n_patches, n_channels, token_dim, channel_dim))
        self.layer_norm = nn.LayerNorm((n_channels,))
        self.mlp_head = nn.Dense(n_channels, n_classes)
        self.mean = ops.ReduceMean()
        self._initialize_weights()

    def construct(self, x):
        x = self.to_patch_embedding(x)
        x = self.mixer_blocks(x)
        x = self.layer_norm(x)
        x = self.mean(x, 1)
        return self.mlp_head(x)

    def _initialize_weights(self):
        # todo: implement weights init
        pass
mindocr.models.backbones.mindcv_models.mlpmixer.MixerBlock

Bases: nn.Cell

Mixer Layer with token-mixing MLP and channel-mixing MLP

Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
class MixerBlock(nn.Cell):
    """Mixer Layer with token-mixing MLP and channel-mixing MLP"""

    def __init__(self, n_patches, n_channels, token_dim, channel_dim, dropout=0.):
        super().__init__()
        self.token_mix = nn.SequentialCell(
            nn.LayerNorm((n_channels,)),
            TransPose((0, 2, 1)),
            FeedForward(n_patches, token_dim, dropout),
            TransPose((0, 2, 1))
        )
        self.channel_mix = nn.SequentialCell(
            nn.LayerNorm((n_channels,)),
            FeedForward(n_channels, channel_dim, dropout),
        )

    def construct(self, x):
        x = x + self.token_mix(x)
        x = x + self.channel_mix(x)
        return x
mindocr.models.backbones.mindcv_models.mlpmixer.TransPose

Bases: nn.Cell

TransPose Layer. Wrap operator Transpose for easy integration in nn.SequentialCell

Source code in mindocr\models\backbones\mindcv_models\mlpmixer.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class TransPose(nn.Cell):
    """TransPose Layer. Wrap operator Transpose for easy integration in nn.SequentialCell"""

    def __init__(self, permutation=(0, 2, 1), embedding=False):
        super(TransPose, self).__init__()
        self.permutation = permutation
        self.embedding = embedding
        if embedding:
            self.reshape = ops.Reshape()
        self.transpose = ops.Transpose()

    def construct(self, x):
        if self.embedding:
            b, c, h, w = x.shape
            x = self.reshape(x, (b, c, h * w))
        x = self.transpose(x, self.permutation)
        return x
mindocr.models.backbones.mindcv_models.mnasnet

MindSpore implementation of MnasNet. Refer to MnasNet: Platform-Aware Neural Architecture Search for Mobile.

mindocr.models.backbones.mindcv_models.mnasnet.Mnasnet

Bases: nn.Cell

MnasNet model architecture from "MnasNet: Platform-Aware Neural Architecture Search for Mobile" <https://arxiv.org/abs/1807.11626>_.

PARAMETER DESCRIPTION
alpha

scale factor of model width.

TYPE: float

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

drop_rate

dropout rate of the layer before main classifier. Default: 0.2.

TYPE: float DEFAULT: 0.2

Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
class Mnasnet(nn.Cell):
    r"""MnasNet model architecture from
    `"MnasNet: Platform-Aware Neural Architecture Search for Mobile" <https://arxiv.org/abs/1807.11626>`_.

    Args:
        alpha: scale factor of model width.
        in_channels: number the channels of the input. Default: 3.
        num_classes: number of classification classes. Default: 1000.
        drop_rate: dropout rate of the layer before main classifier. Default: 0.2.
    """

    def __init__(
        self,
        alpha: float,
        in_channels: int = 3,
        num_classes: int = 1000,
        drop_rate: float = 0.2,
    ):
        super().__init__()

        inverted_residual_setting = [
            # t, c, n, s, k
            [3, 24, 3, 2, 3],  # -> 56x56
            [3, 40, 3, 2, 5],  # -> 28x28
            [6, 80, 3, 2, 5],  # -> 14x14
            [6, 96, 2, 1, 3],  # -> 14x14
            [6, 192, 4, 2, 5],  # -> 7x7
            [6, 320, 1, 1, 3],  # -> 7x7
        ]

        mid_channels = make_divisible(32 * alpha, 8)
        input_channels = make_divisible(16 * alpha, 8)

        features: List[nn.Cell] = [
            nn.Conv2d(in_channels, mid_channels, kernel_size=3, stride=2, pad_mode="pad", padding=1),
            nn.BatchNorm2d(mid_channels, momentum=0.99, eps=1e-3),
            nn.ReLU(),
            nn.Conv2d(mid_channels, mid_channels, kernel_size=3, stride=1, pad_mode="pad", padding=1,
                      group=mid_channels),
            nn.BatchNorm2d(mid_channels, momentum=0.99, eps=1e-3),
            nn.ReLU(),
            nn.Conv2d(mid_channels, input_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(input_channels, momentum=0.99, eps=1e-3),
        ]

        for t, c, n, s, k in inverted_residual_setting:
            output_channels = make_divisible(c * alpha, 8)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(InvertedResidual(input_channels, output_channels,
                                                 stride=stride, kernel_size=k, expand_ratio=t))
                input_channels = output_channels

        features.extend([
            nn.Conv2d(input_channels, 1280, kernel_size=1, stride=1),
            nn.BatchNorm2d(1280, momentum=0.99, eps=1e-3),
            nn.ReLU(),
        ])
        self.features = nn.SequentialCell(features)
        self.pool = GlobalAvgPooling()
        self.dropout = nn.Dropout(keep_prob=1 - drop_rate)
        self.classifier = nn.Dense(1280, num_classes)
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(mode="fan_out", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(mode="fan_out", nonlinearity="sigmoid"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.dropout(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MnasNet model with width scaled by 0.5. Refer to the base class models.Mnasnet for more details.

Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
179
180
181
182
183
184
185
186
187
188
189
@register_model
def mnasnet0_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> Mnasnet:
    """Get MnasNet model with width scaled by 0.5.
    Refer to the base class `models.Mnasnet` for more details."""
    default_cfg = default_cfgs["mnasnet0.5"]
    model = Mnasnet(alpha=0.5, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet0_75(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MnasNet model with width scaled by 0.75. Refer to the base class models.Mnasnet for more details.

Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
192
193
194
195
196
197
198
199
200
201
202
@register_model
def mnasnet0_75(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> Mnasnet:
    """Get MnasNet model with width scaled by 0.75.
    Refer to the base class `models.Mnasnet` for more details."""
    default_cfg = default_cfgs["mnasnet0.75"]
    model = Mnasnet(alpha=0.75, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MnasNet model with width scaled by 1.0. Refer to the base class models.Mnasnet for more details.

Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
205
206
207
208
209
210
211
212
213
214
215
@register_model
def mnasnet1_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> Mnasnet:
    """Get MnasNet model with width scaled by 1.0.
    Refer to the base class `models.Mnasnet` for more details."""
    default_cfg = default_cfgs["mnasnet1.0"]
    model = Mnasnet(alpha=1.0, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet1_3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MnasNet model with width scaled by 1.3. Refer to the base class models.Mnasnet for more details.

Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
218
219
220
221
222
223
224
225
226
227
228
@register_model
def mnasnet1_3(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> Mnasnet:
    """Get MnasNet model with width scaled by 1.3.
    Refer to the base class `models.Mnasnet` for more details."""
    default_cfg = default_cfgs["mnasnet1.3"]
    model = Mnasnet(alpha=1.3, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mnasnet.mnasnet1_4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MnasNet model with width scaled by 1.4. Refer to the base class models.Mnasnet for more details.

Source code in mindocr\models\backbones\mindcv_models\mnasnet.py
231
232
233
234
235
236
237
238
239
240
241
@register_model
def mnasnet1_4(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> Mnasnet:
    """Get MnasNet model with width scaled by 1.4.
    Refer to the base class `models.Mnasnet` for more details."""
    default_cfg = default_cfgs["mnasnet1.4"]
    model = Mnasnet(alpha=1.4, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v1

MindSpore implementation of MobileNetV1. Refer to MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.

mindocr.models.backbones.mindcv_models.mobilenet_v1.MobileNetV1

Bases: nn.Cell

MobileNetV1 model class, based on "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" <https://arxiv.org/abs/1704.04861>_ # noqa: E501

PARAMETER DESCRIPTION
alpha

scale factor of model width. Default: 1.

TYPE: float DEFAULT: 1.0

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
class MobileNetV1(nn.Cell):
    r"""MobileNetV1 model class, based on
    `"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" <https://arxiv.org/abs/1704.04861>`_  # noqa: E501

    Args:
        alpha: scale factor of model width. Default: 1.
        in_channels: number the channels of the input. Default: 3.
        num_classes: number of classification classes. Default: 1000.
    """

    def __init__(
        self,
        alpha: float = 1.0,
        in_channels: int = 3,
        num_classes: int = 1000,
    ) -> None:
        super().__init__()
        input_channels = int(32 * alpha)
        # Setting of depth-wise separable conv
        # c: number of output channel
        # s: stride of depth-wise conv
        block_setting = [
            # c, s
            [64, 1],
            [128, 2],
            [128, 1],
            [256, 2],
            [256, 1],
            [512, 2],
            [512, 1],
            [512, 1],
            [512, 1],
            [512, 1],
            [512, 1],
            [1024, 2],
            [1024, 1],
        ]

        features = [
            nn.Conv2d(in_channels, input_channels, 3, 2, pad_mode="pad", padding=1, has_bias=False),
            nn.BatchNorm2d(input_channels),
            nn.ReLU(),
        ]
        for c, s in block_setting:
            output_channel = int(c * alpha)
            features.append(depthwise_separable_conv(input_channels, output_channel, s))
            input_channels = output_channel
        self.features = nn.SequentialCell(features)

        self.pool = GlobalAvgPooling()
        self.classifier = nn.Dense(input_channels, num_classes)
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(init.initializer(init.XavierUniform(), cell.weight.shape, cell.weight.dtype))
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(init.initializer(init.TruncatedNormal(), cell.weight.shape, cell.weight.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_025_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV1 model with width scaled by 0.25. Refer to the base class models.MobileNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
137
138
139
140
141
142
143
144
145
146
147
148
@register_model
def mobilenet_v1_025_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV1:
    """Get MobileNetV1 model with width scaled by 0.25.
    Refer to the base class `models.MobileNetV1` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v1_0.25_224"]
    model = MobileNetV1(alpha=0.25, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_050_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV1 model with width scaled by 0.5. Refer to the base class models.MobileNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
151
152
153
154
155
156
157
158
159
160
161
162
@register_model
def mobilenet_v1_050_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV1:
    """Get MobileNetV1 model with width scaled by 0.5.
    Refer to the base class `models.MobileNetV1` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v1_0.5_224"]
    model = MobileNetV1(alpha=0.5, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_075_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV1 model with width scaled by 0.75. Refer to the base class models.MobileNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
165
166
167
168
169
170
171
172
173
174
175
176
@register_model
def mobilenet_v1_075_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV1:
    """Get MobileNetV1 model with width scaled by 0.75.
    Refer to the base class `models.MobileNetV1` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v1_0.75_224"]
    model = MobileNetV1(alpha=0.75, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v1.mobilenet_v1_100_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV1 model without width scaling. Refer to the base class models.MobileNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v1.py
179
180
181
182
183
184
185
186
187
188
189
190
@register_model
def mobilenet_v1_100_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV1:
    """Get MobileNetV1 model without width scaling.
    Refer to the base class `models.MobileNetV1` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v1_1.0_224"]
    model = MobileNetV1(alpha=1.0, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2

MindSpore implementation of MobileNetV2. Refer to MobileNetV2: Inverted Residuals and Linear Bottlenecks.

mindocr.models.backbones.mindcv_models.mobilenet_v2.InvertedResidual

Bases: nn.Cell

Inverted Residual Block of MobileNetV2

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
class InvertedResidual(nn.Cell):
    """Inverted Residual Block of MobileNetV2"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        stride: int,
        expand_ratio: int,
    ) -> None:
        super().__init__()
        assert stride in [1, 2]
        hidden_dim = round(in_channels * expand_ratio)
        self.use_res_connect = stride == 1 and in_channels == out_channels

        layers = []
        if expand_ratio != 1:
            # pw
            layers.extend([
                nn.Conv2d(in_channels, hidden_dim, 1, 1, pad_mode="pad", padding=0, has_bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6()
            ])
        layers.extend([
            # dw
            nn.Conv2d(hidden_dim, hidden_dim, 3, stride, pad_mode="pad", padding=1, group=hidden_dim, has_bias=False),
            nn.BatchNorm2d(hidden_dim),
            nn.ReLU6(),
            # pw-linear
            nn.Conv2d(hidden_dim, out_channels, 1, 1, pad_mode="pad", padding=0, has_bias=False),
            nn.BatchNorm2d(out_channels),
        ])
        self.layers = nn.SequentialCell(layers)

    def construct(self, x: Tensor) -> Tensor:
        if self.use_res_connect:
            return x + self.layers(x)
        return self.layers(x)
mindocr.models.backbones.mindcv_models.mobilenet_v2.MobileNetV2

Bases: nn.Cell

MobileNetV2 model class, based on "MobileNetV2: Inverted Residuals and Linear Bottlenecks" <https://arxiv.org/abs/1801.04381>_

PARAMETER DESCRIPTION
alpha

scale factor of model width. Default: 1.

TYPE: float DEFAULT: 1.0

round_nearest

divisor of make divisible function. Default: 8.

TYPE: int DEFAULT: 8

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
class MobileNetV2(nn.Cell):
    r"""MobileNetV2 model class, based on
    `"MobileNetV2: Inverted Residuals and Linear Bottlenecks" <https://arxiv.org/abs/1801.04381>`_

    Args:
        alpha: scale factor of model width. Default: 1.
        round_nearest: divisor of make divisible function. Default: 8.
        in_channels: number the channels of the input. Default: 3.
        num_classes: number of classification classes. Default: 1000.
    """

    def __init__(
        self,
        alpha: float = 1.0,
        round_nearest: int = 8,
        in_channels: int = 3,
        num_classes: int = 1000,
    ) -> None:
        super().__init__()
        input_channels = make_divisible(32 * alpha, round_nearest)
        # Setting of inverted residual blocks.
        # t: The expansion factor.
        # c: Number of output channel.
        # n: Number of block.
        # s: First block stride.
        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]
        last_channels = make_divisible(1280 * max(1.0, alpha), round_nearest)

        # Building stem conv layer.
        features = [
            nn.Conv2d(in_channels, input_channels, 3, 2, pad_mode="pad", padding=1, has_bias=False),
            nn.BatchNorm2d(input_channels),
            nn.ReLU6(),
        ]
        # Building inverted residual blocks.
        for t, c, n, s in inverted_residual_setting:
            output_channel = make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(InvertedResidual(input_channels, output_channel, stride, expand_ratio=t))
                input_channels = output_channel
        # Building last point-wise layers.
        features.extend([
            nn.Conv2d(input_channels, last_channels, 1, 1, pad_mode="pad", padding=0, has_bias=False),
            nn.BatchNorm2d(last_channels),
            nn.ReLU6(),
        ])
        self.features = nn.SequentialCell(features)

        self.pool = GlobalAvgPooling()
        self.classifier = nn.SequentialCell([
            nn.Dropout(keep_prob=0.8),  # confirmed by paper authors
            nn.Dense(last_channels, num_classes),
        ])
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                n = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                cell.weight.set_data(
                    init.initializer(init.Normal(sigma=math.sqrt(2. / n), mean=0.0),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Normal(sigma=0.01, mean=0.0), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.35 and input image size of 128. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
541
542
543
544
545
546
547
548
549
550
551
552
@register_model
def mobilenet_v2_035_128(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.35 and input image size of 128.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.35_128"]
    model = MobileNetV2(alpha=0.35, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.35 and input image size of 160. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
527
528
529
530
531
532
533
534
535
536
537
538
@register_model
def mobilenet_v2_035_160(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.35 and input image size of 160.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.35_160"]
    model = MobileNetV2(alpha=0.35, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.35 and input image size of 192. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
513
514
515
516
517
518
519
520
521
522
523
524
@register_model
def mobilenet_v2_035_192(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.35 and input image size of 192.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.35_192"]
    model = MobileNetV2(alpha=0.35, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.35 and input image size of 224. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
499
500
501
502
503
504
505
506
507
508
509
510
@register_model
def mobilenet_v2_035_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.35 and input image size of 224.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.35_224"]
    model = MobileNetV2(alpha=0.35, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_035_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.35 and input image size of 96. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
555
556
557
558
559
560
561
562
563
564
565
566
@register_model
def mobilenet_v2_035_96(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.35 and input image size of 96.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.35_96"]
    model = MobileNetV2(alpha=0.35, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.5 and input image size of 128. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
471
472
473
474
475
476
477
478
479
480
481
482
@register_model
def mobilenet_v2_050_128(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.5 and input image size of 128.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.5_128"]
    model = MobileNetV2(alpha=0.5, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.5 and input image size of 160. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
457
458
459
460
461
462
463
464
465
466
467
468
@register_model
def mobilenet_v2_050_160(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.5 and input image size of 160.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.5_160"]
    model = MobileNetV2(alpha=0.5, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.5 and input image size of 192. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
443
444
445
446
447
448
449
450
451
452
453
454
@register_model
def mobilenet_v2_050_192(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.5 and input image size of 192.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.5_192"]
    model = MobileNetV2(alpha=0.5, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.5 and input image size of 224. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
429
430
431
432
433
434
435
436
437
438
439
440
@register_model
def mobilenet_v2_050_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.5 and input image size of 224.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.5_224"]
    model = MobileNetV2(alpha=0.5, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_050_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.5 and input image size of 96. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
485
486
487
488
489
490
491
492
493
494
495
496
@register_model
def mobilenet_v2_050_96(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.5 and input image size of 96.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.5_96"]
    model = MobileNetV2(alpha=0.5, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.75 and input image size of 128. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
401
402
403
404
405
406
407
408
409
410
411
412
@register_model
def mobilenet_v2_075_128(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.75 and input image size of 128.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.75_128"]
    model = MobileNetV2(alpha=0.75, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.75 and input image size of 160. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
387
388
389
390
391
392
393
394
395
396
397
398
@register_model
def mobilenet_v2_075_160(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.75 and input image size of 160.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.75_160"]
    model = MobileNetV2(alpha=0.75, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.75 and input image size of 192. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
373
374
375
376
377
378
379
380
381
382
383
384
@register_model
def mobilenet_v2_075_192(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.75 and input image size of 192.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.75_192"]
    model = MobileNetV2(alpha=0.75, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.75 and input image size of 224. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
359
360
361
362
363
364
365
366
367
368
369
370
@register_model
def mobilenet_v2_075_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.75 and input image size of 224.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.75_224"]
    model = MobileNetV2(alpha=0.75, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_075_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 0.75 and input image size of 96. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
415
416
417
418
419
420
421
422
423
424
425
426
@register_model
def mobilenet_v2_075_96(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 0.75 and input image size of 96.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_0.75_96"]
    model = MobileNetV2(alpha=0.75, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_128(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model without width scaling and input image size of 128. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
331
332
333
334
335
336
337
338
339
340
341
342
@register_model
def mobilenet_v2_100_128(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model without width scaling and input image size of 128.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.0_128"]
    model = MobileNetV2(alpha=1.0, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_160(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model without width scaling and input image size of 160. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
317
318
319
320
321
322
323
324
325
326
327
328
@register_model
def mobilenet_v2_100_160(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model without width scaling and input image size of 160.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.0_160"]
    model = MobileNetV2(alpha=1.0, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_192(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model without width scaling and input image size of 192. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
303
304
305
306
307
308
309
310
311
312
313
314
@register_model
def mobilenet_v2_100_192(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model without width scaling and input image size of 192.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.0_192"]
    model = MobileNetV2(alpha=1.0, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model without width scaling and input image size of 224. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
289
290
291
292
293
294
295
296
297
298
299
300
@register_model
def mobilenet_v2_100_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model without width scaling and input image size of 224.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.0_224"]
    model = MobileNetV2(alpha=1.0, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_100_96(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model without width scaling and input image size of 96. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
345
346
347
348
349
350
351
352
353
354
355
356
@register_model
def mobilenet_v2_100_96(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model without width scaling and input image size of 96.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.0_96"]
    model = MobileNetV2(alpha=1.0, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_130_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 1.3 and input image size of 224. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
275
276
277
278
279
280
281
282
283
284
285
286
@register_model
def mobilenet_v2_130_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 1.3 and input image size of 224.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.3_224"]
    model = MobileNetV2(alpha=1.3, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v2.mobilenet_v2_140_224(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get MobileNetV2 model with width scaled by 1.4 and input image size of 224. Refer to the base class models.MobileNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v2.py
261
262
263
264
265
266
267
268
269
270
271
272
@register_model
def mobilenet_v2_140_224(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV2:
    """Get MobileNetV2 model with width scaled by 1.4 and input image size of 224.
    Refer to the base class `models.MobileNetV2` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v2_1.4_224"]
    model = MobileNetV2(alpha=1.4, num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v3

MindSpore implementation of MobileNetV3. Refer to Searching for MobileNetV3.

mindocr.models.backbones.mindcv_models.mobilenet_v3.Bottleneck

Bases: nn.Cell

Bottleneck Block of MobilenetV3. depth-wise separable convolutions + inverted residual + squeeze excitation

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
class Bottleneck(nn.Cell):
    """Bottleneck Block of MobilenetV3. depth-wise separable convolutions + inverted residual + squeeze excitation"""

    def __init__(
            self,
            in_channels: int,
            mid_channels: int,
            out_channels: int,
            kernel_size: int,
            stride: int = 1,
            activation: str = "relu",
            use_se: bool = False,
            se_version: str = 'SqueezeExcite',
            always_expand: bool = False
    ) -> None:
        super().__init__()
        self.use_res_connect = stride == 1 and in_channels == out_channels
        assert activation in ["relu", "hswish"]
        self.activation = nn.HSwish if activation == "hswish" else nn.ReLU

        layers = []
        # Expand.
        if in_channels != mid_channels or always_expand:
            layers.extend([
                nn.Conv2d(in_channels, mid_channels, 1, 1, pad_mode="pad", padding=0, has_bias=False),
                nn.BatchNorm2d(mid_channels),
                self.activation(),
            ])

        # DepthWise.
        layers.extend([
            nn.Conv2d(mid_channels, mid_channels, kernel_size, stride,
                      pad_mode="same", group=mid_channels, has_bias=False),
            nn.BatchNorm2d(mid_channels),
            self.activation(),
        ])
        # SqueezeExcitation.
        if use_se and se_version == 'SqueezeExcite':
            layers.append(SqueezeExcite(mid_channels, 1.0 / 4, act_layer=nn.ReLU, gate_layer=nn.HSigmoid))
        elif use_se and se_version == 'SqueezeExciteV2':
            layers.append(SqueezeExciteV2(mid_channels, rd_channels=mid_channels // 4))

        # Project.
        layers.extend([
            nn.Conv2d(mid_channels, out_channels, 1, 1, pad_mode="pad", padding=0, has_bias=False),
            nn.BatchNorm2d(out_channels),
        ])
        self.layers = nn.SequentialCell(layers)

    def construct(self, x: Tensor) -> Tensor:
        if self.use_res_connect:
            return x + self.layers(x)
        return self.layers(x)
mindocr.models.backbones.mindcv_models.mobilenet_v3.MobileNetV3

Bases: nn.Cell

MobileNetV3 model class, based on "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>_

PARAMETER DESCRIPTION
arch

size of the architecture. 'small' or 'large'.

TYPE: str

alpha

scale factor of model width. Default: 1.

TYPE: float DEFAULT: 1.0

round_nearest

divisor of make divisible function. Default: 8.

TYPE: int DEFAULT: 8

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
class MobileNetV3(nn.Cell):
    r"""MobileNetV3 model class, based on
    `"Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>`_

    Args:
        arch: size of the architecture. 'small' or 'large'.
        alpha: scale factor of model width. Default: 1.
        round_nearest: divisor of make divisible function. Default: 8.
        in_channels: number the channels of the input. Default: 3.
        num_classes: number of classification classes. Default: 1000.
    """

    def __init__(
            self,
            arch: str,
            alpha: float = 1.0,
            round_nearest: int = 8,
            in_channels: int = 3,
            num_classes: int = 1000,
            scale_last: bool = True,
            bottleneck_params: dict = None
    ) -> None:
        super().__init__()
        input_channels = make_divisible(16 * alpha, round_nearest)
        # Setting of bottleneck blocks. ex: [k, e, c, se, nl, s]
        # k: kernel size of depth-wise conv
        # e: expansion size
        # c: number of output channel
        # se: whether there is a Squeeze-And-Excite in that block
        # nl: type of non-linearity used
        # s: stride of depth-wise conv
        if arch == "large":
            bottleneck_setting = [
                [3, 16, 16, False, "relu", 1],
                [3, 64, 24, False, "relu", 2],
                [3, 72, 24, False, "relu", 1],
                [5, 72, 40, True, "relu", 2],
                [5, 120, 40, True, "relu", 1],
                [5, 120, 40, True, "relu", 1],
                [3, 240, 80, False, "hswish", 2],
                [3, 200, 80, False, "hswish", 1],
                [3, 184, 80, False, "hswish", 1],
                [3, 184, 80, False, "hswish", 1],
                [3, 480, 112, True, "hswish", 1],
                [3, 672, 112, True, "hswish", 1],
                [5, 672, 160, True, "hswish", 2],
                [5, 960, 160, True, "hswish", 1],
                [5, 960, 160, True, "hswish", 1],
            ]
            last_channels = make_divisible(alpha * 1280, round_nearest) if scale_last else 1280
        elif arch == "small":
            bottleneck_setting = [
                [3, 16, 16, True, "relu", 2],
                [3, 72, 24, False, "relu", 2],
                [3, 88, 24, False, "relu", 1],
                [5, 96, 40, True, "hswish", 2],
                [5, 240, 40, True, "hswish", 1],
                [5, 240, 40, True, "hswish", 1],
                [5, 120, 48, True, "hswish", 1],
                [5, 144, 48, True, "hswish", 1],
                [5, 288, 96, True, "hswish", 2],
                [5, 576, 96, True, "hswish", 1],
                [5, 576, 96, True, "hswish", 1],
            ]
            last_channels = make_divisible(alpha * 1024, round_nearest) if scale_last else 1024
        else:
            raise ValueError(f"Unsupported model type {arch}")

        # Building stem conv layer.
        features = [
            nn.Conv2d(in_channels, input_channels, 3, 2, pad_mode="pad", padding=1, has_bias=False),
            nn.BatchNorm2d(input_channels),
            nn.HSwish(),
        ]
        total_reduction = 2
        self.feature_info = [dict(chs=input_channels, reduction=total_reduction, name=f'features.{len(features) - 1}')]

        if bottleneck_params is None:
            bottleneck_params = {}

        # Building bottleneck blocks.
        for k, e, c, se, nl, s in bottleneck_setting:
            exp_channels = make_divisible(alpha * e, round_nearest)
            output_channels = make_divisible(alpha * c, round_nearest)
            features.append(Bottleneck(input_channels, exp_channels, output_channels,
                                       kernel_size=k, stride=s, activation=nl, use_se=se, **bottleneck_params))
            input_channels = output_channels
            total_reduction *= s
            self.feature_info.append(
                dict(chs=input_channels, reduction=total_reduction, name=f'features.{len(features) - 1}'))
        # Building last point-wise conv layers.
        output_channels = input_channels * 6
        features.extend([
            nn.Conv2d(input_channels, output_channels, 1, 1, pad_mode="pad", padding=0, has_bias=False),
            nn.BatchNorm2d(output_channels),
            nn.HSwish(),
        ])
        self.feature_info.append(
            dict(chs=output_channels, reduction=total_reduction, name=f'features.{len(features) - 1}'))
        self.flatten_sequential = True
        self.features = nn.CellList(features)

        self.pool = GlobalAvgPooling()
        self.classifier = nn.SequentialCell([
            nn.Dense(output_channels, last_channels),
            nn.HSwish(),
            nn.Dropout(keep_prob=0.8),
            nn.Dense(last_channels, num_classes),
        ])
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                n = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                cell.weight.set_data(
                    init.initializer(init.Normal(sigma=math.sqrt(2. / n), mean=0.0),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Normal(sigma=0.01, mean=0.0), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        for feature in self.features:
            x = feature(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_large_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get large MobileNetV3 model with width scaled by 0.75. Refer to the base class models.MobileNetV3 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
289
290
291
292
293
294
295
296
297
298
299
300
@register_model
def mobilenet_v3_large_075(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV3:
    """Get large MobileNetV3 model with width scaled by 0.75.
    Refer to the base class `models.MobileNetV3` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v3_large_0.75"]
    model = MobileNetV3(arch="large", alpha=0.75, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_large_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get large MobileNetV3 model without width scaling. Refer to the base class models.MobileNetV3 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
261
262
263
264
265
266
267
268
269
270
271
272
@register_model
def mobilenet_v3_large_100(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV3:
    """Get large MobileNetV3 model without width scaling.
    Refer to the base class `models.MobileNetV3` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v3_large_1.0"]
    model = MobileNetV3(arch="large", alpha=1.0, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_small_075(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get small MobileNetV3 model with width scaled by 0.75. Refer to the base class models.MobileNetV3 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
275
276
277
278
279
280
281
282
283
284
285
286
@register_model
def mobilenet_v3_small_075(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV3:
    """Get small MobileNetV3 model with width scaled by 0.75.
    Refer to the base class `models.MobileNetV3` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v3_small_0.75"]
    model = MobileNetV3(arch="small", alpha=0.75, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.mobilenet_v3.mobilenet_v3_small_100(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get small MobileNetV3 model without width scaling. Refer to the base class models.MobileNetV3 for more details.

Source code in mindocr\models\backbones\mindcv_models\mobilenet_v3.py
247
248
249
250
251
252
253
254
255
256
257
258
@register_model
def mobilenet_v3_small_100(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> MobileNetV3:
    """Get small MobileNetV3 model without width scaling.
    Refer to the base class `models.MobileNetV3` for more details.
    """
    default_cfg = default_cfgs["mobilenet_v3_small_1.0"]
    model = MobileNetV3(arch="small", alpha=1.0, in_channels=in_channels, num_classes=num_classes, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.model_factory
mindocr.models.backbones.mindcv_models.model_factory.create_model(model_name, num_classes=1000, pretrained=False, in_channels=3, checkpoint_path='', ema=False, features_only=False, out_indices=[0, 1, 2, 3, 4], **kwargs)

Creates model by name.

PARAMETER DESCRIPTION
model_name

The name of model.

TYPE: str

num_classes

The number of classes. Default: 1000.

TYPE: int DEFAULT: 1000

pretrained

Whether to load the pretrained model. Default: False.

TYPE: bool DEFAULT: False

in_channels

The input channels. Default: 3.

TYPE: int DEFAULT: 3

checkpoint_path

The path of checkpoint files. Default: "".

TYPE: str DEFAULT: ''

ema

Whether use ema method. Default: False.

TYPE: bool DEFAULT: False

features_only

Output the features at different strides instead. Default: False

TYPE: bool DEFAULT: False

out_indices

The indicies of the output features when features_only is True. Default: [0, 1, 2, 3, 4]

TYPE: list[int] DEFAULT: [0, 1, 2, 3, 4]

Source code in mindocr\models\backbones\mindcv_models\model_factory.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def create_model(
    model_name: str,
    num_classes: int = 1000,
    pretrained=False,
    in_channels: int = 3,
    checkpoint_path: str = "",
    ema: bool = False,
    features_only: bool = False,
    out_indices: List[int] = [0, 1, 2, 3, 4],
    **kwargs,
):
    r"""Creates model by name.

    Args:
        model_name (str):  The name of model.
        num_classes (int): The number of classes. Default: 1000.
        pretrained (bool): Whether to load the pretrained model. Default: False.
        in_channels (int): The input channels. Default: 3.
        checkpoint_path (str): The path of checkpoint files. Default: "".
        ema (bool): Whether use ema method. Default: False.
        features_only (bool): Output the features at different strides instead. Default: False
        out_indices (list[int]): The indicies of the output features when `features_only` is `True`.
            Default: [0, 1, 2, 3, 4]
    """

    if checkpoint_path != "" and pretrained:
        raise ValueError("checkpoint_path is mutually exclusive with pretrained")

    model_args = dict(num_classes=num_classes, pretrained=pretrained, in_channels=in_channels)
    kwargs = {k: v for k, v in kwargs.items() if v is not None}

    if not is_model(model_name):
        raise RuntimeError(f"Unknown model {model_name}")

    create_fn = model_entrypoint(model_name)
    model = create_fn(**model_args, **kwargs)

    if os.path.exists(checkpoint_path):
        checkpoint_param = load_checkpoint(checkpoint_path)
        ema_param_dict = dict()
        for param in checkpoint_param:
            if param.startswith("ema"):
                new_name = param.split("ema.")[1]
                ema_data = checkpoint_param[param]
                ema_data.name = new_name
                ema_param_dict[new_name] = ema_data

        if ema_param_dict and ema:
            load_param_into_net(model, ema_param_dict)
        elif bool(ema_param_dict) is False and ema:
            raise ValueError("chekpoint_param does not contain ema_parameter, please set ema is False.")
        else:
            load_param_into_net(model, checkpoint_param)

    if features_only:
        # wrap the model, output the feature pyramid instead
        try:
            model = FeatureExtractWrapper(model, out_indices=out_indices)
        except AttributeError as e:
            raise RuntimeError(f"`feature_only` is not implemented for `{model_name}` model.") from e

    return model
mindocr.models.backbones.mindcv_models.nasnet

MindSpore implementation of NasNet. Refer to: Learning Transferable Architectures for Scalable Image Recognition

mindocr.models.backbones.mindcv_models.nasnet.BranchSeparables

Bases: nn.Cell

NasNet model basic architecture

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
class BranchSeparables(nn.Cell):
    """NasNet model basic architecture"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int,
        stride: int,
        padding: int,
        bias: bool = False,
    ) -> None:
        super().__init__()
        self.relu = nn.ReLU()
        self.separable_1 = SeparableConv2d(
            in_channels, in_channels, kernel_size, stride, padding, bias=bias
        )
        self.bn_sep_1 = nn.BatchNorm2d(num_features=in_channels, eps=0.001, momentum=0.9, affine=True)
        self.relu1 = nn.ReLU()
        self.separable_2 = SeparableConv2d(
            in_channels, out_channels, kernel_size, 1, padding, bias=bias
        )
        self.bn_sep_2 = nn.BatchNorm2d(num_features=out_channels, eps=0.001, momentum=0.9, affine=True)

    def construct(self, x: Tensor) -> Tensor:
        x = self.relu(x)
        x = self.separable_1(x)
        x = self.bn_sep_1(x)
        x = self.relu1(x)
        x = self.separable_2(x)
        x = self.bn_sep_2(x)
        return x
mindocr.models.backbones.mindcv_models.nasnet.BranchSeparablesReduction

Bases: BranchSeparables

NasNet model Residual Connections

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class BranchSeparablesReduction(BranchSeparables):
    """NasNet model Residual Connections"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int,
        stride: int,
        padding: int,
        z_padding: int = 1,
        bias: bool = False,
    ) -> None:
        BranchSeparables.__init__(
            self, in_channels, out_channels, kernel_size, stride, padding, bias
        )
        self.padding = nn.Pad(paddings=((0, 0), (0, 0), (z_padding, 0), (z_padding, 0)), mode="CONSTANT")

    def construct(self, x: Tensor) -> Tensor:
        x = self.relu(x)
        x = self.padding(x)
        x = self.separable_1(x)
        x = x[:, :, 1:, 1:]
        x = self.bn_sep_1(x)
        x = self.relu1(x)
        x = self.separable_2(x)
        x = self.bn_sep_2(x)
        return x
mindocr.models.backbones.mindcv_models.nasnet.BranchSeparablesStem

Bases: nn.Cell

NasNet model basic architecture

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
class BranchSeparablesStem(nn.Cell):
    """NasNet model basic architecture"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int,
        stride: int,
        padding: int,
        bias: bool = False,
    ) -> None:
        super().__init__()
        self.relu = nn.ReLU()
        self.separable_1 = SeparableConv2d(
            in_channels, out_channels, kernel_size, stride, padding, bias=bias
        )
        self.bn_sep_1 = nn.BatchNorm2d(num_features=out_channels, eps=0.001, momentum=0.9, affine=True)
        self.relu1 = nn.ReLU()
        self.separable_2 = SeparableConv2d(
            out_channels, out_channels, kernel_size, 1, padding, bias=bias
        )
        self.bn_sep_2 = nn.BatchNorm2d(num_features=out_channels, eps=0.001, momentum=0.9, affine=True)

    def construct(self, x: Tensor) -> Tensor:
        x = self.relu(x)
        x = self.separable_1(x)
        x = self.bn_sep_1(x)
        x = self.relu1(x)
        x = self.separable_2(x)
        x = self.bn_sep_2(x)
        return x
mindocr.models.backbones.mindcv_models.nasnet.CellStem0

Bases: nn.Cell

NasNet model basic architecture

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
class CellStem0(nn.Cell):
    """NasNet model basic architecture"""

    def __init__(
        self,
        stem_filters: int,
        num_filters: int = 42,
    ) -> None:
        super().__init__()
        self.num_filters = num_filters
        self.stem_filters = stem_filters
        self.conv_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=self.stem_filters, out_channels=self.num_filters, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=self.num_filters, eps=0.001, momentum=0.9, affine=True)
        ])

        self.comb_iter_0_left = BranchSeparables(
            self.num_filters, self.num_filters, 5, 2, 2
        )
        self.comb_iter_0_right = BranchSeparablesStem(
            self.stem_filters, self.num_filters, 7, 2, 3, bias=False
        )

        self.comb_iter_1_left = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
        self.comb_iter_1_right = BranchSeparablesStem(
            self.stem_filters, self.num_filters, 7, 2, 3, bias=False
        )

        self.comb_iter_2_left = nn.AvgPool2d(kernel_size=3, stride=2, pad_mode="same")
        self.comb_iter_2_right = BranchSeparablesStem(
            self.stem_filters, self.num_filters, 5, 2, 2, bias=False
        )

        self.comb_iter_3_right = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_4_left = BranchSeparables(
            self.num_filters, self.num_filters, 3, 1, 1, bias=False
        )
        self.comb_iter_4_right = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")

    def construct(self, x: Tensor) -> Tensor:
        x1 = self.conv_1x1(x)

        x_comb_iter_0_left = self.comb_iter_0_left(x1)
        x_comb_iter_0_right = self.comb_iter_0_right(x)
        x_comb_iter_0 = x_comb_iter_0_left + x_comb_iter_0_right

        x_comb_iter_1_left = self.comb_iter_1_left(x1)
        x_comb_iter_1_right = self.comb_iter_1_right(x)
        x_comb_iter_1 = x_comb_iter_1_left + x_comb_iter_1_right

        x_comb_iter_2_left = self.comb_iter_2_left(x1)
        x_comb_iter_2_right = self.comb_iter_2_right(x)
        x_comb_iter_2 = x_comb_iter_2_left + x_comb_iter_2_right

        x_comb_iter_3_right = self.comb_iter_3_right(x_comb_iter_0)
        x_comb_iter_3 = x_comb_iter_3_right + x_comb_iter_1

        x_comb_iter_4_left = self.comb_iter_4_left(x_comb_iter_0)
        x_comb_iter_4_right = self.comb_iter_4_right(x1)
        x_comb_iter_4 = x_comb_iter_4_left + x_comb_iter_4_right

        x_out = ops.concat((x_comb_iter_1, x_comb_iter_2, x_comb_iter_3, x_comb_iter_4), axis=1)
        return x_out
mindocr.models.backbones.mindcv_models.nasnet.CellStem1

Bases: nn.Cell

NasNet model basic architecture

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
class CellStem1(nn.Cell):
    """NasNet model basic architecture"""

    def __init__(
        self,
        stem_filters: int,
        num_filters: int,
    ) -> None:
        super().__init__()
        self.num_filters = num_filters
        self.stem_filters = stem_filters
        self.conv_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=2 * self.num_filters, out_channels=self.num_filters, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=self.num_filters, eps=0.001, momentum=0.9, affine=True)])

        self.relu = nn.ReLU()
        self.path_1 = nn.SequentialCell([
            nn.AvgPool2d(kernel_size=1, stride=2, pad_mode="valid"),
            nn.Conv2d(in_channels=self.stem_filters, out_channels=self.num_filters // 2, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False)])

        self.path_2 = nn.CellList([])
        self.path_2.append(nn.Pad(paddings=((0, 0), (0, 0), (0, 1), (0, 1)), mode="CONSTANT"))
        self.path_2.append(
            nn.AvgPool2d(kernel_size=1, stride=2, pad_mode="valid")
        )
        self.path_2.append(
            nn.Conv2d(in_channels=self.stem_filters, out_channels=self.num_filters // 2, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False)
        )

        self.final_path_bn = nn.BatchNorm2d(num_features=self.num_filters, eps=0.001, momentum=0.9, affine=True)

        self.comb_iter_0_left = BranchSeparables(
            self.num_filters,
            self.num_filters,
            5,
            2,
            2,
            bias=False
        )
        self.comb_iter_0_right = BranchSeparables(
            self.num_filters,
            self.num_filters,
            7,
            2,
            3,
            bias=False
        )

        self.comb_iter_1_left = nn.MaxPool2d(3, stride=2, pad_mode="same")
        self.comb_iter_1_right = BranchSeparables(
            self.num_filters,
            self.num_filters,
            7,
            2,
            3,
            bias=False
        )

        self.comb_iter_2_left = nn.AvgPool2d(3, stride=2, pad_mode="same")
        self.comb_iter_2_right = BranchSeparables(
            self.num_filters,
            self.num_filters,
            5,
            2,
            2,
            bias=False
        )

        self.comb_iter_3_right = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_4_left = BranchSeparables(
            self.num_filters,
            self.num_filters,
            3,
            1,
            1,
            bias=False
        )
        self.comb_iter_4_right = nn.MaxPool2d(3, stride=2, pad_mode="same")

    def construct(self, x_conv0: Tensor, x_stem_0: Tensor) -> Tensor:
        x_left = self.conv_1x1(x_stem_0)
        x_relu = self.relu(x_conv0)
        # path 1
        x_path1 = self.path_1(x_relu)
        # path 2
        x_path2 = self.path_2[0](x_relu)
        x_path2 = x_path2[:, :, 1:, 1:]
        x_path2 = self.path_2[1](x_path2)
        x_path2 = self.path_2[2](x_path2)
        # final path
        x_right = self.final_path_bn(ops.concat((x_path1, x_path2), axis=1))

        x_comb_iter_0_left = self.comb_iter_0_left(x_left)
        x_comb_iter_0_right = self.comb_iter_0_right(x_right)
        x_comb_iter_0 = x_comb_iter_0_left + x_comb_iter_0_right

        x_comb_iter_1_left = self.comb_iter_1_left(x_left)
        x_comb_iter_1_right = self.comb_iter_1_right(x_right)
        x_comb_iter_1 = x_comb_iter_1_left + x_comb_iter_1_right

        x_comb_iter_2_left = self.comb_iter_2_left(x_left)
        x_comb_iter_2_right = self.comb_iter_2_right(x_right)
        x_comb_iter_2 = x_comb_iter_2_left + x_comb_iter_2_right

        x_comb_iter_3_right = self.comb_iter_3_right(x_comb_iter_0)
        x_comb_iter_3 = x_comb_iter_3_right + x_comb_iter_1

        x_comb_iter_4_left = self.comb_iter_4_left(x_comb_iter_0)
        x_comb_iter_4_right = self.comb_iter_4_right(x_left)
        x_comb_iter_4 = x_comb_iter_4_left + x_comb_iter_4_right

        x_out = ops.concat((x_comb_iter_1, x_comb_iter_2, x_comb_iter_3, x_comb_iter_4), axis=1)
        return x_out
mindocr.models.backbones.mindcv_models.nasnet.FirstCell

Bases: nn.Cell

NasNet model basic architecture

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
class FirstCell(nn.Cell):
    """NasNet model basic architecture"""

    def __init__(
        self,
        in_channels_left: int,
        out_channels_left: int,
        in_channels_right: int,
        out_channels_right: int,
    ) -> None:
        super().__init__()
        self.conv_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_right, out_channels=out_channels_right, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_right, eps=0.001, momentum=0.9, affine=True)])

        self.relu = nn.ReLU()
        self.path_1 = nn.SequentialCell([
            nn.AvgPool2d(kernel_size=1, stride=2, pad_mode="valid"),
            nn.Conv2d(in_channels=in_channels_left, out_channels=out_channels_left, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False)])

        self.path_2 = nn.CellList([])
        self.path_2.append(nn.Pad(paddings=((0, 0), (0, 0), (0, 1), (0, 1)), mode="CONSTANT"))
        self.path_2.append(
            nn.AvgPool2d(kernel_size=1, stride=2, pad_mode="valid")
        )
        self.path_2.append(
            nn.Conv2d(in_channels=in_channels_left, out_channels=out_channels_left, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False)
        )

        self.final_path_bn = nn.BatchNorm2d(num_features=out_channels_left * 2, eps=0.001, momentum=0.9, affine=True)

        self.comb_iter_0_left = BranchSeparables(
            out_channels_right, out_channels_right, 5, 1, 2, bias=False
        )
        self.comb_iter_0_right = BranchSeparables(
            out_channels_right, out_channels_right, 3, 1, 1, bias=False
        )

        self.comb_iter_1_left = BranchSeparables(
            out_channels_right, out_channels_right, 5, 1, 2, bias=False
        )
        self.comb_iter_1_right = BranchSeparables(
            out_channels_right, out_channels_right, 3, 1, 1, bias=False
        )

        self.comb_iter_2_left = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_3_left = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")
        self.comb_iter_3_right = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_4_left = BranchSeparables(
            out_channels_right, out_channels_right, 3, 1, 1, bias=False
        )

    def construct(self, x: Tensor, x_prev: Tensor) -> Tensor:
        x_relu = self.relu(x_prev)
        x_path1 = self.path_1(x_relu)
        x_path2 = self.path_2[0](x_relu)
        x_path2 = x_path2[:, :, 1:, 1:]
        x_path2 = self.path_2[1](x_path2)
        x_path2 = self.path_2[2](x_path2)
        # final path
        x_left = self.final_path_bn(ops.concat((x_path1, x_path2), axis=1))

        x_right = self.conv_1x1(x)

        x_comb_iter_0_left = self.comb_iter_0_left(x_right)
        x_comb_iter_0_right = self.comb_iter_0_right(x_left)
        x_comb_iter_0 = x_comb_iter_0_left + x_comb_iter_0_right

        x_comb_iter_1_left = self.comb_iter_1_left(x_left)
        x_comb_iter_1_right = self.comb_iter_1_right(x_left)
        x_comb_iter_1 = x_comb_iter_1_left + x_comb_iter_1_right

        x_comb_iter_2_left = self.comb_iter_2_left(x_right)
        x_comb_iter_2 = x_comb_iter_2_left + x_left

        x_comb_iter_3_left = self.comb_iter_3_left(x_left)
        x_comb_iter_3_right = self.comb_iter_3_right(x_left)
        x_comb_iter_3 = x_comb_iter_3_left + x_comb_iter_3_right

        x_comb_iter_4_left = self.comb_iter_4_left(x_right)
        x_comb_iter_4 = x_comb_iter_4_left + x_right

        x_out = ops.concat((x_left, x_comb_iter_0, x_comb_iter_1, x_comb_iter_2, x_comb_iter_3, x_comb_iter_4), axis=1)
        return x_out
mindocr.models.backbones.mindcv_models.nasnet.NASNetAMobile

Bases: nn.Cell

NasNet model class, based on "Learning Transferable Architectures for Scalable Image Recognition" <https://arxiv.org/pdf/1707.07012v4.pdf>_

PARAMETER DESCRIPTION
num_classes

number of classification classes.

TYPE: int DEFAULT: 1000

stem_filters

number of stem filters. Default: 32.

TYPE: int DEFAULT: 32

penultimate_filters

number of penultimate filters. Default: 1056.

TYPE: int DEFAULT: 1056

filters_multiplier

size of filters multiplier. Default: 2.

TYPE: int DEFAULT: 2

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
class NASNetAMobile(nn.Cell):
    r"""NasNet model class, based on
    `"Learning Transferable Architectures for Scalable Image Recognition" <https://arxiv.org/pdf/1707.07012v4.pdf>`_
    Args:
        num_classes: number of classification classes.
        stem_filters: number of stem filters. Default: 32.
        penultimate_filters: number of penultimate filters. Default: 1056.
        filters_multiplier: size of filters multiplier. Default: 2.
    """

    def __init__(
        self,
        in_channels: int = 3,
        num_classes: int = 1000,
        stem_filters: int = 32,
        penultimate_filters: int = 1056,
        filters_multiplier: int = 2,
    ) -> None:
        super().__init__()
        self.stem_filters = stem_filters
        self.penultimate_filters = penultimate_filters
        self.filters_multiplier = filters_multiplier

        filters = self.penultimate_filters // 24
        # 24 is default value for the architecture

        self.conv0 = nn.SequentialCell([
            nn.Conv2d(in_channels=in_channels, out_channels=self.stem_filters, kernel_size=3, stride=2, pad_mode="pad",
                      padding=0,
                      has_bias=False),
            nn.BatchNorm2d(num_features=self.stem_filters, eps=0.001, momentum=0.9, affine=True)
        ])

        self.cell_stem_0 = CellStem0(
            self.stem_filters, num_filters=filters // (filters_multiplier ** 2)
        )
        self.cell_stem_1 = CellStem1(
            self.stem_filters, num_filters=filters // filters_multiplier
        )

        self.cell_0 = FirstCell(
            in_channels_left=filters,
            out_channels_left=filters // 2,  # 1, 0.5
            in_channels_right=2 * filters,
            out_channels_right=filters,
        )  # 2, 1
        self.cell_1 = NormalCell(
            in_channels_left=2 * filters,
            out_channels_left=filters,  # 2, 1
            in_channels_right=6 * filters,
            out_channels_right=filters,
        )  # 6, 1
        self.cell_2 = NormalCell(
            in_channels_left=6 * filters,
            out_channels_left=filters,  # 6, 1
            in_channels_right=6 * filters,
            out_channels_right=filters,
        )  # 6, 1
        self.cell_3 = NormalCell(
            in_channels_left=6 * filters,
            out_channels_left=filters,  # 6, 1
            in_channels_right=6 * filters,
            out_channels_right=filters,
        )  # 6, 1

        self.reduction_cell_0 = ReductionCell0(
            in_channels_left=6 * filters,
            out_channels_left=2 * filters,  # 6, 2
            in_channels_right=6 * filters,
            out_channels_right=2 * filters,
        )  # 6, 2

        self.cell_6 = FirstCell(
            in_channels_left=6 * filters,
            out_channels_left=filters,  # 6, 1
            in_channels_right=8 * filters,
            out_channels_right=2 * filters,
        )  # 8, 2
        self.cell_7 = NormalCell(
            in_channels_left=8 * filters,
            out_channels_left=2 * filters,  # 8, 2
            in_channels_right=12 * filters,
            out_channels_right=2 * filters,
        )  # 12, 2
        self.cell_8 = NormalCell(
            in_channels_left=12 * filters,
            out_channels_left=2 * filters,  # 12, 2
            in_channels_right=12 * filters,
            out_channels_right=2 * filters,
        )  # 12, 2
        self.cell_9 = NormalCell(
            in_channels_left=12 * filters,
            out_channels_left=2 * filters,  # 12, 2
            in_channels_right=12 * filters,
            out_channels_right=2 * filters,
        )  # 12, 2

        self.reduction_cell_1 = ReductionCell1(
            in_channels_left=12 * filters,
            out_channels_left=4 * filters,  # 12, 4
            in_channels_right=12 * filters,
            out_channels_right=4 * filters,
        )  # 12, 4

        self.cell_12 = FirstCell(
            in_channels_left=12 * filters,
            out_channels_left=2 * filters,  # 12, 2
            in_channels_right=16 * filters,
            out_channels_right=4 * filters,
        )  # 16, 4
        self.cell_13 = NormalCell(
            in_channels_left=16 * filters,
            out_channels_left=4 * filters,  # 16, 4
            in_channels_right=24 * filters,
            out_channels_right=4 * filters,
        )  # 24, 4
        self.cell_14 = NormalCell(
            in_channels_left=24 * filters,
            out_channels_left=4 * filters,  # 24, 4
            in_channels_right=24 * filters,
            out_channels_right=4 * filters,
        )  # 24, 4
        self.cell_15 = NormalCell(
            in_channels_left=24 * filters,
            out_channels_left=4 * filters,  # 24, 4
            in_channels_right=24 * filters,
            out_channels_right=4 * filters,
        )  # 24, 4

        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(keep_prob=0.5)
        self.classifier = nn.Dense(in_channels=24 * filters, out_channels=num_classes)
        self.pool = GlobalAvgPooling()
        self._initialize_weights()

    def _initialize_weights(self):
        """Initialize weights for cells."""
        self.init_parameters_data()
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                n = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                cell.weight.set_data(init.initializer(init.Normal(math.sqrt(2. / n), 0),
                                                      cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(init.initializer(init.Normal(0.01, 0), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        """Network forward feature extraction."""
        x_conv0 = self.conv0(x)
        x_stem_0 = self.cell_stem_0(x_conv0)
        x_stem_1 = self.cell_stem_1(x_conv0, x_stem_0)

        x_cell_0 = self.cell_0(x_stem_1, x_stem_0)
        x_cell_1 = self.cell_1(x_cell_0, x_stem_1)
        x_cell_2 = self.cell_2(x_cell_1, x_cell_0)
        x_cell_3 = self.cell_3(x_cell_2, x_cell_1)

        x_reduction_cell_0 = self.reduction_cell_0(x_cell_3, x_cell_2)

        x_cell_6 = self.cell_6(x_reduction_cell_0, x_cell_3)
        x_cell_7 = self.cell_7(x_cell_6, x_reduction_cell_0)
        x_cell_8 = self.cell_8(x_cell_7, x_cell_6)
        x_cell_9 = self.cell_9(x_cell_8, x_cell_7)

        x_reduction_cell_1 = self.reduction_cell_1(x_cell_9, x_cell_8)

        x_cell_12 = self.cell_12(x_reduction_cell_1, x_cell_9)
        x_cell_13 = self.cell_13(x_cell_12, x_reduction_cell_1)
        x_cell_14 = self.cell_14(x_cell_13, x_cell_12)
        x_cell_15 = self.cell_15(x_cell_14, x_cell_13)

        x_cell_15 = self.relu(x_cell_15)
        return x_cell_15

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)  # global average pool
        x = self.dropout(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.nasnet.NASNetAMobile.forward_features(x)

Network forward feature extraction.

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
def forward_features(self, x: Tensor) -> Tensor:
    """Network forward feature extraction."""
    x_conv0 = self.conv0(x)
    x_stem_0 = self.cell_stem_0(x_conv0)
    x_stem_1 = self.cell_stem_1(x_conv0, x_stem_0)

    x_cell_0 = self.cell_0(x_stem_1, x_stem_0)
    x_cell_1 = self.cell_1(x_cell_0, x_stem_1)
    x_cell_2 = self.cell_2(x_cell_1, x_cell_0)
    x_cell_3 = self.cell_3(x_cell_2, x_cell_1)

    x_reduction_cell_0 = self.reduction_cell_0(x_cell_3, x_cell_2)

    x_cell_6 = self.cell_6(x_reduction_cell_0, x_cell_3)
    x_cell_7 = self.cell_7(x_cell_6, x_reduction_cell_0)
    x_cell_8 = self.cell_8(x_cell_7, x_cell_6)
    x_cell_9 = self.cell_9(x_cell_8, x_cell_7)

    x_reduction_cell_1 = self.reduction_cell_1(x_cell_9, x_cell_8)

    x_cell_12 = self.cell_12(x_reduction_cell_1, x_cell_9)
    x_cell_13 = self.cell_13(x_cell_12, x_reduction_cell_1)
    x_cell_14 = self.cell_14(x_cell_13, x_cell_12)
    x_cell_15 = self.cell_15(x_cell_14, x_cell_13)

    x_cell_15 = self.relu(x_cell_15)
    return x_cell_15
mindocr.models.backbones.mindcv_models.nasnet.NormalCell

Bases: nn.Cell

NasNet model basic architecture

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
class NormalCell(nn.Cell):
    """NasNet model basic architecture"""
    def __init__(self,
                 in_channels_left: int,
                 out_channels_left: int,
                 in_channels_right: int,
                 out_channels_right: int) -> None:
        super().__init__()
        self.conv_prev_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_left, out_channels=out_channels_left, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_left, eps=0.001, momentum=0.9, affine=True)])

        self.conv_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_right, out_channels=out_channels_right, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_right, eps=0.001, momentum=0.9, affine=True)])

        self.comb_iter_0_left = BranchSeparables(
            out_channels_right, out_channels_right, 5, 1, 2, bias=False
        )
        self.comb_iter_0_right = BranchSeparables(
            out_channels_left, out_channels_left, 3, 1, 1, bias=False
        )

        self.comb_iter_1_left = BranchSeparables(
            out_channels_left, out_channels_left, 5, 1, 2, bias=False
        )
        self.comb_iter_1_right = BranchSeparables(
            out_channels_left, out_channels_left, 3, 1, 1, bias=False
        )

        self.comb_iter_2_left = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_3_left = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")
        self.comb_iter_3_right = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_4_left = BranchSeparables(
            out_channels_right, out_channels_right, 3, 1, 1, bias=False
        )

    def construct(self, x: Tensor, x_prev: Tensor) -> Tensor:
        x_left = self.conv_prev_1x1(x_prev)
        x_right = self.conv_1x1(x)

        x_comb_iter_0_left = self.comb_iter_0_left(x_right)
        x_comb_iter_0_right = self.comb_iter_0_right(x_left)
        x_comb_iter_0 = x_comb_iter_0_left + x_comb_iter_0_right

        x_comb_iter_1_left = self.comb_iter_1_left(x_left)
        x_comb_iter_1_right = self.comb_iter_1_right(x_left)
        x_comb_iter_1 = x_comb_iter_1_left + x_comb_iter_1_right

        x_comb_iter_2_left = self.comb_iter_2_left(x_right)
        x_comb_iter_2 = x_comb_iter_2_left + x_left

        x_comb_iter_3_left = self.comb_iter_3_left(x_left)
        x_comb_iter_3_right = self.comb_iter_3_right(x_left)
        x_comb_iter_3 = x_comb_iter_3_left + x_comb_iter_3_right

        x_comb_iter_4_left = self.comb_iter_4_left(x_right)
        x_comb_iter_4 = x_comb_iter_4_left + x_right

        x_out = ops.concat((x_left, x_comb_iter_0, x_comb_iter_1, x_comb_iter_2, x_comb_iter_3, x_comb_iter_4), axis=1)
        return x_out
mindocr.models.backbones.mindcv_models.nasnet.ReductionCell0

Bases: nn.Cell

NasNet model Residual Connections

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
class ReductionCell0(nn.Cell):
    """NasNet model Residual Connections"""

    def __init__(
        self,
        in_channels_left: int,
        out_channels_left: int,
        in_channels_right: int,
        out_channels_right: int,
    ) -> None:
        super().__init__()
        self.conv_prev_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_left, out_channels=out_channels_left, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_left, eps=0.001, momentum=0.9, affine=True)])

        self.conv_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_right, out_channels=out_channels_right, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_right, eps=0.001, momentum=0.9, affine=True)])

        self.comb_iter_0_left = BranchSeparablesReduction(
            out_channels_right, out_channels_right, 5, 2, 2, bias=False
        )
        self.comb_iter_0_right = BranchSeparablesReduction(
            out_channels_right, out_channels_right, 7, 2, 3, bias=False
        )

        self.comb_iter_1_left = nn.MaxPool2d(3, stride=2, pad_mode="same")
        self.comb_iter_1_right = BranchSeparablesReduction(
            out_channels_right, out_channels_right, 7, 2, 3, bias=False
        )

        self.comb_iter_2_left = nn.AvgPool2d(3, stride=2, pad_mode="same")
        self.comb_iter_2_right = BranchSeparablesReduction(
            out_channels_right, out_channels_right, 5, 2, 2, bias=False
        )

        self.comb_iter_3_right = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_4_left = BranchSeparablesReduction(
            out_channels_right, out_channels_right, 3, 1, 1, bias=False
        )
        self.comb_iter_4_right = nn.MaxPool2d(3, stride=2, pad_mode="same")

    def construct(self, x: Tensor, x_prev: Tensor) -> Tensor:
        x_left = self.conv_prev_1x1(x_prev)
        x_right = self.conv_1x1(x)

        x_comb_iter_0_left = self.comb_iter_0_left(x_right)
        x_comb_iter_0_right = self.comb_iter_0_right(x_left)
        x_comb_iter_0 = x_comb_iter_0_left + x_comb_iter_0_right

        x_comb_iter_1_left = self.comb_iter_1_left(x_right)
        x_comb_iter_1_right = self.comb_iter_1_right(x_left)
        x_comb_iter_1 = x_comb_iter_1_left + x_comb_iter_1_right

        x_comb_iter_2_left = self.comb_iter_2_left(x_right)
        x_comb_iter_2_right = self.comb_iter_2_right(x_left)
        x_comb_iter_2 = x_comb_iter_2_left + x_comb_iter_2_right

        x_comb_iter_3_right = self.comb_iter_3_right(x_comb_iter_0)
        x_comb_iter_3 = x_comb_iter_3_right + x_comb_iter_1

        x_comb_iter_4_left = self.comb_iter_4_left(x_comb_iter_0)
        x_comb_iter_4_right = self.comb_iter_4_right(x_right)
        x_comb_iter_4 = x_comb_iter_4_left + x_comb_iter_4_right

        x_out = ops.concat((x_comb_iter_1, x_comb_iter_2, x_comb_iter_3, x_comb_iter_4), axis=1)
        return x_out
mindocr.models.backbones.mindcv_models.nasnet.ReductionCell1

Bases: nn.Cell

NasNet model Residual Connections

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
class ReductionCell1(nn.Cell):
    """NasNet model Residual Connections"""

    def __init__(
        self,
        in_channels_left: int,
        out_channels_left: int,
        in_channels_right: int,
        out_channels_right: int,
    ) -> None:
        super().__init__()
        self.conv_prev_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_left, out_channels=out_channels_left, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_left, eps=0.001, momentum=0.9, affine=True)])

        self.conv_1x1 = nn.SequentialCell([
            nn.ReLU(),
            nn.Conv2d(in_channels=in_channels_right, out_channels=out_channels_right, kernel_size=1, stride=1,
                      pad_mode="pad", has_bias=False),
            nn.BatchNorm2d(num_features=out_channels_right, eps=0.001, momentum=0.9, affine=True)])

        self.comb_iter_0_left = BranchSeparables(
            out_channels_right,
            out_channels_right,
            5,
            2,
            2,
            bias=False
        )
        self.comb_iter_0_right = BranchSeparables(
            out_channels_right,
            out_channels_right,
            7,
            2,
            3,
            bias=False
        )

        self.comb_iter_1_left = nn.MaxPool2d(3, stride=2, pad_mode="same")
        self.comb_iter_1_right = BranchSeparables(
            out_channels_right,
            out_channels_right,
            7,
            2,
            3,
            bias=False
        )

        self.comb_iter_2_left = nn.AvgPool2d(3, stride=2, pad_mode="same")
        self.comb_iter_2_right = BranchSeparables(
            out_channels_right,
            out_channels_right,
            5,
            2,
            2,
            bias=False
        )

        self.comb_iter_3_right = nn.AvgPool2d(kernel_size=3, stride=1, pad_mode="same")

        self.comb_iter_4_left = BranchSeparables(
            out_channels_right,
            out_channels_right,
            3,
            1,
            1,
            bias=False
        )
        self.comb_iter_4_right = nn.MaxPool2d(3, stride=2, pad_mode="same")

    def construct(self, x: Tensor, x_prev: Tensor) -> Tensor:
        x_left = self.conv_prev_1x1(x_prev)
        x_right = self.conv_1x1(x)

        x_comb_iter_0_left = self.comb_iter_0_left(x_right)
        x_comb_iter_0_right = self.comb_iter_0_right(x_left)
        x_comb_iter_0 = x_comb_iter_0_left + x_comb_iter_0_right

        x_comb_iter_1_left = self.comb_iter_1_left(x_right)
        x_comb_iter_1_right = self.comb_iter_1_right(x_left)
        x_comb_iter_1 = x_comb_iter_1_left + x_comb_iter_1_right

        x_comb_iter_2_left = self.comb_iter_2_left(x_right)
        x_comb_iter_2_right = self.comb_iter_2_right(x_left)
        x_comb_iter_2 = x_comb_iter_2_left + x_comb_iter_2_right

        x_comb_iter_3_right = self.comb_iter_3_right(x_comb_iter_0)
        x_comb_iter_3 = x_comb_iter_3_right + x_comb_iter_1

        x_comb_iter_4_left = self.comb_iter_4_left(x_comb_iter_0)
        x_comb_iter_4_right = self.comb_iter_4_right(x_right)
        x_comb_iter_4 = x_comb_iter_4_left + x_comb_iter_4_right

        x_out = ops.concat((x_comb_iter_1, x_comb_iter_2, x_comb_iter_3, x_comb_iter_4), axis=1)
        return x_out
mindocr.models.backbones.mindcv_models.nasnet.SeparableConv2d

Bases: nn.Cell

depth-wise convolutions + point-wise convolutions

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
class SeparableConv2d(nn.Cell):
    """depth-wise convolutions + point-wise convolutions"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        dw_kernel: int,
        dw_stride: int,
        dw_padding: int,
        bias: bool = False,
    ) -> None:
        super().__init__()
        self.depthwise_conv2d = nn.Conv2d(in_channels=in_channels, out_channels=in_channels, kernel_size=dw_kernel,
                                          stride=dw_stride, pad_mode="pad", padding=dw_padding, group=in_channels,
                                          has_bias=bias)
        self.pointwise_conv2d = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1,
                                          pad_mode="pad", has_bias=bias)

    def construct(self, x: Tensor) -> Tensor:
        x = self.depthwise_conv2d(x)
        x = self.pointwise_conv2d(x)
        return x
mindocr.models.backbones.mindcv_models.nasnet.nasnet_a_4x1056(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get NasNet model. Refer to the base class models.NASNetAMobile for more details.

Source code in mindocr\models\backbones\mindcv_models\nasnet.py
873
874
875
876
877
878
879
880
881
@register_model
def nasnet_a_4x1056(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> NASNetAMobile:
    """Get NasNet model.
    Refer to the base class `models.NASNetAMobile` for more details."""
    default_cfg = default_cfgs["nasnet_a_4x1056"]
    model = NASNetAMobile(in_channels=in_channels, num_classes=num_classes, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.path

Utility of file path

mindocr.models.backbones.mindcv_models.path.detect_file_type(filename)

Detect file type by suffixes and return tuple(suffix, archive_type, compression).

Source code in mindocr\models\backbones\mindcv_models\path.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def detect_file_type(filename: str):  # pylint: disable=inconsistent-return-statements
    """Detect file type by suffixes and return tuple(suffix, archive_type, compression)."""
    suffixes = pathlib.Path(filename).suffixes
    if not suffixes:
        raise RuntimeError(f"File `{filename}` has no suffixes that could be used to detect.")
    suffix = suffixes[-1]

    # Check if the suffix is a known alias.
    if suffix in FILE_TYPE_ALIASES:
        return suffix, FILE_TYPE_ALIASES[suffix][0], FILE_TYPE_ALIASES[suffix][1]

    # Check if the suffix is an archive type.
    if suffix in ARCHIVE_TYPE_SUFFIX:
        return suffix, suffix, None

    # Check if the suffix is a compression.
    if suffix in COMPRESS_TYPE_SUFFIX:
        # Check for suffix hierarchy.
        if len(suffixes) > 1:
            suffix2 = suffixes[-2]
            # Check if the suffix2 is an archive type.
            if suffix2 in ARCHIVE_TYPE_SUFFIX:
                return suffix2 + suffix, suffix2, suffix
        return suffix, None, suffix
mindocr.models.backbones.mindcv_models.pit

MindSpore implementation of PiT. Refer to Rethinking Spatial Dimensions of Vision Transformers.

mindocr.models.backbones.mindcv_models.pit.Attention

Bases: nn.Cell

define multi-head self attention block

Source code in mindocr\models\backbones\mindcv_models\pit.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
class Attention(nn.Cell):
    """define multi-head self attention block"""

    def __init__(
        self,
        dim: int,
        num_heads: int = 8,
        qkv_bias: bool = False,
        attn_drop: float = 0.0,
        proj_drop: float = 0.0,
    ) -> None:
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = head_dim**-0.5
        # get pair-wise relative position index for each token inside the window
        self.q = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.k = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.v = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(keep_prob=1 - attn_drop)
        self.proj = nn.Dense(dim, dim)
        self.proj_drop = nn.Dropout(keep_prob=1 - proj_drop)
        self.softmax = nn.Softmax(axis=-1)

        self.batchmatmul = ops.BatchMatMul()

    def construct(self, x):
        B, N, C = x.shape
        q = ops.reshape(self.q(x), (B, N, self.num_heads, C // self.num_heads)) * self.scale
        q = ops.transpose(q, (0, 2, 1, 3))
        k = ops.reshape(self.k(x), (B, N, self.num_heads, C // self.num_heads))
        k = ops.transpose(k, (0, 2, 3, 1))
        v = ops.reshape(self.v(x), (B, N, self.num_heads, C // self.num_heads))
        v = ops.transpose(v, (0, 2, 1, 3))

        attn = self.batchmatmul(q, k)
        attn = self.softmax(attn)
        attn = self.attn_drop(attn)

        x = self.batchmatmul(attn, v)
        x = ops.reshape(ops.transpose(x, (0, 2, 1, 3)), (B, N, C))
        x = self.proj(x)
        x = self.proj_drop(x)
        return x
mindocr.models.backbones.mindcv_models.pit.Block

Bases: nn.Cell

define the basic block of PiT

Source code in mindocr\models\backbones\mindcv_models\pit.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
class Block(nn.Cell):
    """define the basic block of PiT"""

    def __init__(
        self,
        dim: int,
        num_heads: int,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = False,
        drop: float = 0.0,
        attn_drop: float = 0.0,
        drop_path: float = 0.0,
        act_layer: nn.cell = nn.GELU,
        norm_layer: nn.cell = nn.LayerNorm,
    ) -> None:
        super().__init__()
        self.norm1 = norm_layer((dim,), epsilon=1e-6)
        self.attn = Attention(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
        self.norm2 = norm_layer((dim,), epsilon=1e-6)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

    def construct(self, x):
        x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
mindocr.models.backbones.mindcv_models.pit.Mlp

Bases: nn.Cell

MLP as used in Vision Transformer, MLP-Mixer and related networks

Source code in mindocr\models\backbones\mindcv_models\pit.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
class Mlp(nn.Cell):
    """MLP as used in Vision Transformer, MLP-Mixer and related networks"""

    def __init__(
        self,
        in_features: int,
        hidden_features: int = None,
        out_features: int = None,
        act_layer: nn.cell = nn.GELU,
        drop: float = 0.0,
    ) -> None:
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.fc1 = nn.Dense(in_channels=in_features, out_channels=hidden_features, has_bias=True)
        self.act = act_layer()
        self.fc2 = nn.Dense(in_channels=hidden_features, out_channels=out_features, has_bias=True)
        self.drop = nn.Dropout(keep_prob=1.0 - drop)

    def construct(self, x):
        x = self.fc1(x)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x
mindocr.models.backbones.mindcv_models.pit.PoolingTransformer

Bases: nn.Cell

PiT model class, based on "Rethinking Spatial Dimensions of Vision Transformers" <https://arxiv.org/abs/2103.16302>

PARAMETER DESCRIPTION
image_size

images input size.

TYPE: int)

patch_size

image patch size.

TYPE: int)

stride

stride of the depthwise conv.

TYPE: int)

base_dims

middle dim of each layer.

TYPE: List[int])

depth

model block depth of each layer.

TYPE: List[int])

heads

number of heads of multi-head attention of each layer

TYPE: List[int])

mlp_ratio

ratio of hidden features in Mlp.

TYPE: float)

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

in_chans

number the channels of the input. Default: 3.

TYPE: int) DEFAULT: 3

attn_drop_rate

attention layers dropout rate. Default: 0.

TYPE: float) DEFAULT: 0.0

drop_rate

dropout rate. Default: 0.

TYPE: float) DEFAULT: 0.0

drop_path_rate

drop path rate. Default: 0.

TYPE: float) DEFAULT: 0.0

Source code in mindocr\models\backbones\mindcv_models\pit.py
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
class PoolingTransformer(nn.Cell):
    r"""PiT model class, based on
    `"Rethinking Spatial Dimensions of Vision Transformers"
    <https://arxiv.org/abs/2103.16302>`
    Args:
        image_size (int) : images input size.
        patch_size (int) : image patch size.
        stride (int) : stride of the depthwise conv.
        base_dims (List[int]) : middle dim of each layer.
        depth (List[int]) : model block depth of each layer.
        heads (List[int]) : number of heads of multi-head attention of each layer
        mlp_ratio (float) : ratio of hidden features in Mlp.
        num_classes (int) : number of classification classes. Default: 1000.
        in_chans (int) : number the channels of the input. Default: 3.
        attn_drop_rate (float) : attention layers dropout rate. Default: 0.
        drop_rate (float) : dropout rate. Default: 0.
        drop_path_rate (float) : drop path rate. Default: 0.
    """

    def __init__(
        self,
        image_size: int,
        patch_size: int,
        stride: int,
        base_dims: List[int],
        depth: List[int],
        heads: List[int],
        mlp_ratio: float,
        num_classes: int = 1000,
        in_chans: int = 3,
        attn_drop_rate: float = 0.0,
        drop_rate: float = 0.0,
        drop_path_rate: float = 0.0,
    ) -> None:
        super().__init__()

        total_block = sum(depth)
        padding = 0
        block_idx = 0

        width = math.floor((image_size + 2 * padding - patch_size) / stride + 1)

        self.base_dims = base_dims
        self.heads = heads
        self.num_classes = num_classes

        self.patch_size = patch_size
        self.pos_embed = Parameter(Tensor(np.random.randn(1, base_dims[0] * heads[0], width, width), mstype.float32))
        self.patch_embed = conv_embedding(in_chans, base_dims[0] * heads[0], patch_size, stride, padding)
        self.cls_token = Parameter(Tensor(np.random.randn(1, 1, base_dims[0] * heads[0]), mstype.float32))

        self.pos_drop = nn.Dropout(keep_prob=1.0 - drop_rate)
        self.tile = ops.Tile()

        self.transformers = nn.CellList([])
        self.pools = nn.CellList([])

        for stage in range(len(depth)):
            drop_path_prob = [drop_path_rate * i / total_block for i in range(block_idx, block_idx + depth[stage])]
            block_idx += depth[stage]
            self.transformers.append(
                Transformer(
                    base_dims[stage], depth[stage], heads[stage], mlp_ratio, drop_rate, attn_drop_rate, drop_path_prob
                )
            )
            if stage < len(heads) - 1:
                self.pools.append(
                    conv_head_pooling(
                        base_dims[stage] * heads[stage], base_dims[stage + 1] * heads[stage + 1], stride=2
                    )
                )

        self.norm = nn.LayerNorm((base_dims[-1] * heads[-1],), epsilon=1e-6)

        self.embed_dim = base_dims[-1] * heads[-1]

        # Classifier head
        if num_classes > 0:
            self.head = nn.Dense(in_channels=base_dims[-1] * heads[-1], out_channels=num_classes, has_bias=True)
        else:
            self.head = Identity()

        self.pos_embed.set_data(
            init.initializer(init.TruncatedNormal(sigma=0.02), self.pos_embed.shape, self.pos_embed.dtype)
        )
        self.cls_token.set_data(
            init.initializer(init.TruncatedNormal(sigma=0.02), self.cls_token.shape, self.cls_token.dtype)
        )
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """init_weights"""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))
            if isinstance(cell, nn.Conv2d):
                n = cell.kernel_size[0] * cell.kernel_size[1] * cell.in_channels
                cell.weight.set_data(
                    init.initializer(init.Uniform(math.sqrt(1.0 / n)), cell.weight.shape, cell.weight.dtype)
                )
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer(init.Uniform(math.sqrt(1.0 / n)), cell.bias.shape, cell.bias.dtype)
                    )
            if isinstance(cell, nn.Dense):
                init_range = 1.0 / np.sqrt(cell.weight.shape[0])
                cell.weight.set_data(init.initializer(init.Uniform(init_range), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Uniform(init_range), cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.patch_embed(x)

        pos_embed = self.pos_embed
        x = self.pos_drop(x + pos_embed)

        cls_tokens = self.tile(self.cls_token, (x.shape[0], 1, 1))

        for stage in range(len(self.pools)):
            x, cls_tokens = self.transformers[stage](x, cls_tokens)
            x, cls_tokens = self.pools[stage](x, cls_tokens)
        x, cls_tokens = self.transformers[-1](x, cls_tokens)

        cls_tokens = self.norm(cls_tokens)

        return cls_tokens

    def forward_head(self, x: Tensor) -> Tensor:
        cls_token = self.head(x[:, 0])
        return cls_token

    def construct(self, x: Tensor) -> Tensor:
        cls_token = self.forward_features(x)
        cls_token = self.forward_head(cls_token)
        return cls_token
mindocr.models.backbones.mindcv_models.pit.Transformer

Bases: nn.Cell

define the transformer block of PiT

Source code in mindocr\models\backbones\mindcv_models\pit.py
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
class Transformer(nn.Cell):
    """define the transformer block of PiT"""

    def __init__(
        self,
        base_dim: List[int],
        depth: List[int],
        heads: List[int],
        mlp_ratio: float,
        drop_rate: float = 0.0,
        attn_drop_rate: float = 0.0,
        drop_path_prob: float = None,
    ) -> None:
        super().__init__()
        self.layers = nn.CellList([])
        embed_dim = base_dim * heads

        if drop_path_prob is None:
            drop_path_prob = [0.0 for _ in range(depth)]

        self.blocks = nn.CellList(
            [
                Block(
                    dim=embed_dim,
                    num_heads=heads,
                    mlp_ratio=mlp_ratio,
                    qkv_bias=True,
                    drop=drop_rate,
                    attn_drop=attn_drop_rate,
                    drop_path=drop_path_prob[i],
                    norm_layer=nn.LayerNorm,
                )
                for i in range(depth)
            ]
        )

    def construct(self, x, cls_tokens):
        h, w = x.shape[2:4]
        x = ops.reshape(x, (x.shape[0], x.shape[1], h * w))
        x = ops.transpose(x, (0, 2, 1))
        token_length = cls_tokens.shape[1]
        x = ops.concat((cls_tokens, x), axis=1)
        for blk in self.blocks:
            x = blk(x)

        cls_tokens = x[:, :token_length]
        x = x[:, token_length:]
        x = ops.transpose(x, (0, 2, 1))
        x = ops.reshape(x, (x.shape[0], x.shape[1], h, w))
        return x, cls_tokens
mindocr.models.backbones.mindcv_models.pit.conv_embedding

Bases: nn.Cell

define embedding layer using conv2d

Source code in mindocr\models\backbones\mindcv_models\pit.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
class conv_embedding(nn.Cell):
    """define embedding layer using conv2d"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        patch_size: int,
        stride: int,
        padding: int,
    ) -> None:
        super().__init__()
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=patch_size,
            stride=stride,
            pad_mode="pad",
            padding=padding,
            has_bias=True,
        )

    def construct(self, x: Tensor) -> Tensor:
        x = self.conv(x)
        return x
mindocr.models.backbones.mindcv_models.pit.conv_head_pooling

Bases: nn.Cell

define pooling layer using conv in spatial tokens with an additional fully-connected layer (to adjust the channel size to match the spatial tokens)

Source code in mindocr\models\backbones\mindcv_models\pit.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
class conv_head_pooling(nn.Cell):
    """define pooling layer using conv in spatial tokens with an additional fully-connected layer
    (to adjust the channel size to match the spatial tokens)"""

    def __init__(
        self,
        in_feature: int,
        out_feature: int,
        stride: int,
        pad_mode: str = "pad",
    ) -> None:
        super().__init__()
        self.conv = nn.Conv2d(
            in_feature,
            out_feature,
            kernel_size=stride + 1,
            padding=stride // 2,
            stride=stride,
            pad_mode=pad_mode,
            group=in_feature,
            has_bias=True,
        )
        self.fc = nn.Dense(in_channels=in_feature, out_channels=out_feature, has_bias=True)

    def construct(self, x, cls_token):
        x = self.conv(x)
        cls_token = self.fc(cls_token)

        return x, cls_token
mindocr.models.backbones.mindcv_models.pit.pit_b(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PiT-B model. Refer to the base class models.PoolingTransformer for more details.

Source code in mindocr\models\backbones\mindcv_models\pit.py
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
@register_model
def pit_b(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolingTransformer:
    """Get PiT-B model.
    Refer to the base class `models.PoolingTransformer` for more details."""
    default_cfg = default_cfgs["pit_b_224"]
    model = PoolingTransformer(
        image_size=224,
        patch_size=14,
        stride=7,
        base_dims=[64, 64, 64],
        depth=[3, 6, 4],
        heads=[4, 8, 16],
        mlp_ratio=4.0,
        num_classes=num_classes,
        in_chans=in_channels,
        **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pit.pit_s(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PiT-S model. Refer to the base class models.PoolingTransformer for more details.

Source code in mindocr\models\backbones\mindcv_models\pit.py
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
@register_model
def pit_s(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolingTransformer:
    """Get PiT-S model.
    Refer to the base class `models.PoolingTransformer` for more details."""
    default_cfg = default_cfgs["pit_s_224"]
    model = PoolingTransformer(
        image_size=224,
        patch_size=16,
        stride=8,
        base_dims=[48, 48, 48],
        depth=[2, 6, 4],
        heads=[3, 6, 12],
        mlp_ratio=4.0,
        num_classes=num_classes,
        in_chans=in_channels,
        **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pit.pit_ti(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PiT-Ti model. Refer to the base class models.PoolingTransformer for more details.

Source code in mindocr\models\backbones\mindcv_models\pit.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
@register_model
def pit_ti(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolingTransformer:
    """Get PiT-Ti model.
    Refer to the base class `models.PoolingTransformer` for more details."""
    default_cfg = default_cfgs["pit_ti_224"]
    model = PoolingTransformer(
        image_size=224,
        patch_size=16,
        stride=8,
        base_dims=[32, 32, 32],
        depth=[2, 6, 4],
        heads=[2, 4, 8],
        mlp_ratio=4.0,
        num_classes=num_classes,
        in_chans=in_channels,
        **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pit.pit_xs(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PiT-XS model. Refer to the base class models.PoolingTransformer for more details.

Source code in mindocr\models\backbones\mindcv_models\pit.py
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
@register_model
def pit_xs(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolingTransformer:
    """Get PiT-XS model.
    Refer to the base class `models.PoolingTransformer` for more details."""
    default_cfg = default_cfgs["pit_xs_224"]
    model = PoolingTransformer(
        image_size=224,
        patch_size=16,
        stride=8,
        base_dims=[48, 48, 48],
        depth=[2, 6, 4],
        heads=[2, 4, 8],
        mlp_ratio=4.0,
        num_classes=num_classes,
        in_chans=in_channels,
        **kwargs
    )

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.poolformer

MindSpore implementation of poolformer. Refer to PoolFormer: MetaFormer Is Actually What You Need for Vision.

mindocr.models.backbones.mindcv_models.poolformer.ConvMlp

Bases: nn.Cell

MLP using 1x1 convs that keeps spatial dims

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
class ConvMlp(nn.Cell):
    """MLP using 1x1 convs that keeps spatial dims"""

    def __init__(
        self,
        in_features,
        hidden_features=None,
        out_features=None,
        act_layer=nn.GELU,
        norm_layer=None,
        bias=True,
        drop=0.0,
    ):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        bias = to_2tuple(bias)

        self.fc1 = nn.Conv2d(in_features, hidden_features, kernel_size=1, has_bias=bias[0])
        self.norm = norm_layer(hidden_features) if norm_layer else Identity()
        self.act = act_layer(approximate=False)
        self.drop = nn.Dropout(1 - drop)
        self.fc2 = nn.Conv2d(hidden_features, out_features, kernel_size=1, has_bias=bias[1])
        self.cls_init_weights()

    def cls_init_weights(self):
        """Initialize weights for cells."""
        for name, m in self.cells_and_names():
            if isinstance(m, nn.Conv2d):
                m.weight.set_data(
                    init.initializer(init.TruncatedNormal(sigma=.02), m.weight.shape, m.weight.dtype))
                if m.bias is not None:
                    m.bias.set_data(
                        init.initializer(init.Constant(0), m.bias.shape, m.bias.dtype))

    def construct(self, x):
        x = self.fc1(x)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x
mindocr.models.backbones.mindcv_models.poolformer.ConvMlp.cls_init_weights()

Initialize weights for cells.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
87
88
89
90
91
92
93
94
95
def cls_init_weights(self):
    """Initialize weights for cells."""
    for name, m in self.cells_and_names():
        if isinstance(m, nn.Conv2d):
            m.weight.set_data(
                init.initializer(init.TruncatedNormal(sigma=.02), m.weight.shape, m.weight.dtype))
            if m.bias is not None:
                m.bias.set_data(
                    init.initializer(init.Constant(0), m.bias.shape, m.bias.dtype))
mindocr.models.backbones.mindcv_models.poolformer.PatchEmbed

Bases: nn.Cell

Patch Embedding that is implemented by a layer of conv. Input: tensor in shape [B, C, H, W] Output: tensor in shape [B, C, H/stride, W/stride]

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
class PatchEmbed(nn.Cell):
    """Patch Embedding that is implemented by a layer of conv.
    Input: tensor in shape [B, C, H, W]
    Output: tensor in shape [B, C, H/stride, W/stride]"""

    def __init__(self, in_chs=3, embed_dim=768, patch_size=16, stride=16, padding=0, norm_layer=None):
        super().__init__()
        patch_size = to_2tuple(patch_size)
        stride = to_2tuple(stride)
        # padding = to_2tuple(padding)
        self.proj = nn.Conv2d(in_chs, embed_dim, kernel_size=patch_size, stride=stride, padding=padding, pad_mode="pad",
                              has_bias=True)
        self.norm = norm_layer(embed_dim) if norm_layer else Identity()

    def construct(self, x):
        x = self.proj(x)
        x = self.norm(x)
        return x
mindocr.models.backbones.mindcv_models.poolformer.PoolFormer

Bases: nn.Cell

PoolFormer model class, based on "MetaFormer Is Actually What You Need for Vision" <https://arxiv.org/pdf/2111.11418v3.pdf>_

PARAMETER DESCRIPTION
layers

number of blocks for the 4 stages

embed_dims

the embedding dims for the 4 stages. Default: (64, 128, 320, 512)

DEFAULT: (64, 128, 320, 512)

mlp_ratios

mlp ratios for the 4 stages. Default: (4, 4, 4, 4)

DEFAULT: (4, 4, 4, 4)

downsamples

flags to apply downsampling or not. Default: (True, True, True, True)

DEFAULT: (True, True, True, True)

pool_size

the pooling size for the 4 stages. Default: 3

DEFAULT: 3

in_chans

number of input channels. Default: 3

DEFAULT: 3

num_classes

number of classes for the image classification. Default: 1000

DEFAULT: 1000

global_pool

define the types of pooling layer. Default: avg

DEFAULT: 'avg'

norm_layer

define the types of normalization. Default: nn.GroupNorm

DEFAULT: nn.GroupNorm

act_layer

define the types of activation. Default: nn.GELU

DEFAULT: nn.GELU

in_patch_size

specify the patch embedding for the input image. Default: 7

DEFAULT: 7

in_stride

specify the stride for the input image. Default: 4.

DEFAULT: 4

in_pad

specify the pad for the input image. Default: 2.

DEFAULT: 2

down_patch_size

specify the downsample. Default: 3.

DEFAULT: 3

down_stride

specify the downsample (patch embed.). Default: 2.

DEFAULT: 2

down_pad

specify the downsample (patch embed.). Default: 1.

DEFAULT: 1

drop_rate

dropout rate of the layer before main classifier. Default: 0.

DEFAULT: 0.0

drop_path_rate

Stochastic Depth. Default: 0.

DEFAULT: 0.0

layer_scale_init_value

LayerScale. Default: 1e-5.

DEFAULT: 1e-05

fork_feat

whether output features of the 4 stages, for dense prediction. Default: False.

DEFAULT: False

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
class PoolFormer(nn.Cell):
    r"""PoolFormer model class, based on
    `"MetaFormer Is Actually What You Need for Vision" <https://arxiv.org/pdf/2111.11418v3.pdf>`_

    Args:
        layers: number of blocks for the 4 stages
        embed_dims: the embedding dims for the 4 stages. Default: (64, 128, 320, 512)
        mlp_ratios: mlp ratios for the 4 stages. Default: (4, 4, 4, 4)
        downsamples: flags to apply downsampling or not. Default: (True, True, True, True)
        pool_size: the pooling size for the 4 stages. Default: 3
        in_chans: number of input channels. Default: 3
        num_classes: number of classes for the image classification. Default: 1000
        global_pool: define the types of pooling layer. Default: avg
        norm_layer: define the types of normalization. Default: nn.GroupNorm
        act_layer: define the types of activation. Default: nn.GELU
        in_patch_size: specify the patch embedding for the input image. Default: 7
        in_stride: specify the stride for the input image. Default: 4.
        in_pad: specify the pad for the input image. Default: 2.
        down_patch_size: specify the downsample. Default: 3.
        down_stride: specify the downsample (patch embed.). Default: 2.
        down_pad: specify the downsample (patch embed.). Default: 1.
        drop_rate: dropout rate of the layer before main classifier. Default: 0.
        drop_path_rate: Stochastic Depth. Default: 0.
        layer_scale_init_value: LayerScale. Default: 1e-5.
        fork_feat: whether output features of the 4 stages, for dense prediction. Default: False.
    """

    def __init__(
        self,
        layers,
        embed_dims=(64, 128, 320, 512),
        mlp_ratios=(4, 4, 4, 4),
        downsamples=(True, True, True, True),
        pool_size=3,
        in_chans=3,
        num_classes=1000,
        global_pool="avg",
        norm_layer=nn.GroupNorm,
        act_layer=nn.GELU,
        in_patch_size=7,
        in_stride=4,
        in_pad=2,
        down_patch_size=3,
        down_stride=2,
        down_pad=1,
        drop_rate=0.0,
        drop_path_rate=0.0,
        layer_scale_init_value=1e-5,
        fork_feat=False,
    ):
        super().__init__()

        if not fork_feat:
            self.num_classes = num_classes
        self.fork_feat = fork_feat

        self.global_pool = global_pool
        self.num_features = embed_dims[-1]
        self.grad_checkpointing = False

        self.patch_embed = PatchEmbed(
            patch_size=in_patch_size, stride=in_stride, padding=in_pad,
            in_chs=in_chans, embed_dim=embed_dims[0])

        # set the main block in network
        network = []
        for i in range(len(layers)):
            network.append(basic_blocks(
                embed_dims[i], i, layers,
                pool_size=pool_size, mlp_ratio=mlp_ratios[i],
                act_layer=act_layer, norm_layer=norm_layer,
                drop_rate=drop_rate, drop_path_rate=drop_path_rate,
                layer_scale_init_value=layer_scale_init_value)
            )
            if i < len(layers) - 1 and (downsamples[i] or embed_dims[i] != embed_dims[i + 1]):
                # downsampling between stages
                network.append(PatchEmbed(
                    in_chs=embed_dims[i], embed_dim=embed_dims[i + 1],
                    patch_size=down_patch_size, stride=down_stride, padding=down_pad)
                )

        self.network = nn.SequentialCell(*network)
        self.norm = norm_layer(1, embed_dims[-1])
        self.head = nn.Dense(embed_dims[-1], num_classes, has_bias=True) if num_classes > 0 else Identity()
        # self._initialize_weights()
        self.cls_init_weights()

    def cls_init_weights(self):
        """Initialize weights for cells."""
        for name, m in self.cells_and_names():
            if isinstance(m, nn.Dense):
                m.weight.set_data(
                    init.initializer(init.TruncatedNormal(sigma=.02), m.weight.shape, m.weight.dtype))
                if m.bias is not None:
                    m.bias.set_data(
                        init.initializer(init.Constant(0), m.bias.shape, m.bias.dtype))

    def reset_classifier(self, num_classes, global_pool=None):
        self.num_classes = num_classes
        if global_pool is not None:
            self.global_pool = global_pool
        self.head = nn.Dense(self.num_features, num_classes) if num_classes > 0 else Identity()

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.patch_embed(x)
        x = self.network(x)
        if self.fork_feat:
            # otuput features of four stages for dense prediction
            return x
        x = self.norm(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        return self.head(x.mean([-2, -1]))

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        return self.forward_head(x)
mindocr.models.backbones.mindcv_models.poolformer.PoolFormer.cls_init_weights()

Initialize weights for cells.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
290
291
292
293
294
295
296
297
298
def cls_init_weights(self):
    """Initialize weights for cells."""
    for name, m in self.cells_and_names():
        if isinstance(m, nn.Dense):
            m.weight.set_data(
                init.initializer(init.TruncatedNormal(sigma=.02), m.weight.shape, m.weight.dtype))
            if m.bias is not None:
                m.bias.set_data(
                    init.initializer(init.Constant(0), m.bias.shape, m.bias.dtype))
mindocr.models.backbones.mindcv_models.poolformer.PoolFormerBlock

Bases: nn.Cell

Implementation of one PoolFormer block.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
class PoolFormerBlock(nn.Cell):
    """Implementation of one PoolFormer block."""

    def __init__(
        self,
        dim,
        pool_size=3,
        mlp_ratio=4.0,
        act_layer=nn.GELU,
        norm_layer=nn.GroupNorm,
        drop=0.0,
        drop_path=0.0,
        layer_scale_init_value=1e-5,
    ):
        super().__init__()
        self.norm1 = norm_layer(1, dim)
        self.token_mixer = Pooling(pool_size=pool_size)
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
        self.norm2 = norm_layer(1, dim)
        self.mlp = ConvMlp(dim, hidden_features=int(dim * mlp_ratio), act_layer=act_layer, drop=drop)

        if layer_scale_init_value:
            layer_scale_init_tensor = Tensor(layer_scale_init_value * np.ones([dim]).astype(np.float32))
            self.layer_scale_1 = mindspore.Parameter(layer_scale_init_tensor)
            self.layer_scale_2 = mindspore.Parameter(layer_scale_init_tensor)
        else:
            self.layer_scale_1 = None
            self.layer_scale_2 = None
        self.expand_dims = ops.ExpandDims()

    def construct(self, x):
        if self.layer_scale_1 is not None:
            x = x + self.drop_path(
                self.expand_dims(self.expand_dims(self.layer_scale_1, -1), -1) * self.token_mixer(self.norm1(x)))
            x = x + self.drop_path(
                self.expand_dims(self.expand_dims(self.layer_scale_2, -1), -1) * self.mlp(self.norm2(x)))
        else:
            x = x + self.drop_path(self.token_mixer(self.norm1(x)))
            x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
mindocr.models.backbones.mindcv_models.poolformer.basic_blocks(dim, index, layers, pool_size=3, mlp_ratio=4.0, act_layer=nn.GELU, norm_layer=nn.GroupNorm, drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=1e-05)

generate PoolFormer blocks for a stage

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
def basic_blocks(
    dim,
    index,
    layers,
    pool_size=3,
    mlp_ratio=4.0,
    act_layer=nn.GELU,
    norm_layer=nn.GroupNorm,
    drop_rate=0.0,
    drop_path_rate=0.0,
    layer_scale_init_value=1e-5,
):
    """generate PoolFormer blocks for a stage"""
    blocks = []
    for block_idx in range(layers[index]):
        block_dpr = drop_path_rate * (block_idx + sum(layers[:index])) / (sum(layers) - 1)
        blocks.append(PoolFormerBlock(
            dim, pool_size=pool_size, mlp_ratio=mlp_ratio,
            act_layer=act_layer, norm_layer=norm_layer,
            drop=drop_rate, drop_path=block_dpr,
            layer_scale_init_value=layer_scale_init_value,
        ))
    blocks = nn.SequentialCell(*blocks)
    return blocks
mindocr.models.backbones.mindcv_models.poolformer.poolformer_m36(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get poolformer_m36 model. Refer to the base class models.PoolFormer for more details.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
@register_model
def poolformer_m36(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolFormer:
    """Get poolformer_m36 model.
    Refer to the base class `models.PoolFormer` for more details."""
    default_cfg = default_cfgs["poolformer_m36"]
    layers = (6, 6, 18, 6)
    embed_dims = (96, 192, 384, 768)
    model = PoolFormer(
        in_chans=in_channels,
        num_classes=num_classes,
        layers=layers,
        layer_scale_init_value=1e-6,
        embed_dims=embed_dims,
        **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.poolformer.poolformer_m48(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get poolformer_m48 model. Refer to the base class models.PoolFormer for more details.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
@register_model
def poolformer_m48(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolFormer:
    """Get poolformer_m48 model.
    Refer to the base class `models.PoolFormer` for more details."""
    default_cfg = default_cfgs["poolformer_m48"]
    layers = (8, 8, 24, 8)
    embed_dims = (96, 192, 384, 768)
    model = PoolFormer(
        in_chans=in_channels,
        num_classes=num_classes,
        layers=layers,
        layer_scale_init_value=1e-6,
        embed_dims=embed_dims,
        **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.poolformer.poolformer_s12(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get poolformer_s12 model. Refer to the base class models.PoolFormer for more details.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
323
324
325
326
327
328
329
330
331
@register_model
def poolformer_s12(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolFormer:
    """Get poolformer_s12 model.
    Refer to the base class `models.PoolFormer` for more details."""
    default_cfg = default_cfgs["poolformer_s12"]
    model = PoolFormer(in_chans=in_channels, num_classes=num_classes, layers=(2, 2, 6, 2), **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.poolformer.poolformer_s24(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get poolformer_s24 model. Refer to the base class models.PoolFormer for more details.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
334
335
336
337
338
339
340
341
342
@register_model
def poolformer_s24(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolFormer:
    """Get poolformer_s24 model.
    Refer to the base class `models.PoolFormer` for more details."""
    default_cfg = default_cfgs["poolformer_s24"]
    model = PoolFormer(in_chans=in_channels, num_classes=num_classes, layers=(4, 4, 12, 4), **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.poolformer.poolformer_s36(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get poolformer_s36 model. Refer to the base class models.PoolFormer for more details.

Source code in mindocr\models\backbones\mindcv_models\poolformer.py
345
346
347
348
349
350
351
352
353
354
355
@register_model
def poolformer_s36(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs) -> PoolFormer:
    """Get poolformer_s36 model.
    Refer to the base class `models.PoolFormer` for more details."""
    default_cfg = default_cfgs["poolformer_s36"]
    model = PoolFormer(
        in_chans=in_channels, num_classes=num_classes, layers=(6, 6, 18, 6), layer_scale_init_value=1e-6, **kwargs
    )
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.pvt

MindSpore implementation of PVT. Refer to PVT: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

mindocr.models.backbones.mindcv_models.pvt.Attention

Bases: nn.Cell

spatial-reduction attention (SRA)

Source code in mindocr\models\backbones\mindcv_models\pvt.py
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
class Attention(nn.Cell):
    """spatial-reduction attention (SRA)"""

    def __init__(
        self,
        dim: int,
        num_heads: int = 8,
        qkv_bias: bool = False,
        qk_scale: Optional[float] = None,
        attn_drop: float = 0.0,
        proj_drop: float = 0.0,
        sr_ratio: int = 1,
    ):
        super(Attention, self).__init__()
        assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}."

        self.dim = dim
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = qk_scale or head_dim**-0.5

        self.q = nn.Dense(dim, dim, has_bias=qkv_bias)
        self.kv = nn.Dense(dim, dim * 2, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(1 - attn_drop)
        self.proj = nn.Dense(dim, dim)
        self.proj_drop = nn.Dropout(1 - proj_drop)
        self.qk_batmatmul = ops.BatchMatMul(transpose_b=True)
        self.batmatmul = ops.BatchMatMul()
        self.softmax = nn.Softmax(axis=-1)
        self.reshape = ops.reshape
        self.transpose = ops.transpose

        self.sr_ratio = sr_ratio
        if sr_ratio > 1:
            self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio, has_bias=True)
            self.norm = nn.LayerNorm([dim])

    def construct(self, x, H, W):
        B, N, C = x.shape
        q = self.q(x)
        q = self.reshape(q, (B, N, self.num_heads, C // self.num_heads))
        q = self.transpose(q, (0, 2, 1, 3))
        if self.sr_ratio > 1:
            x_ = self.reshape(self.transpose(x, (0, 2, 1)), (B, C, H, W))

            x_ = self.transpose(self.reshape(self.sr(x_), (B, C, -1)), (0, 2, 1))
            x_ = self.norm(x_)
            kv = self.kv(x_)

            kv = self.transpose(self.reshape(kv, (B, -1, 2, self.num_heads, C // self.num_heads)), (2, 0, 3, 1, 4))
        else:
            kv = self.kv(x)
            kv = self.transpose(self.reshape(kv, (B, -1, 2, self.num_heads, C // self.num_heads)), (2, 0, 3, 1, 4))
        k, v = kv[0], kv[1]
        attn = self.qk_batmatmul(q, k) * self.scale
        attn = self.softmax(attn)
        attn = self.attn_drop(attn)
        x = self.batmatmul(attn, v)
        x = self.reshape(self.transpose(x, (0, 2, 1, 3)), (B, N, C))
        x = self.proj(x)
        x = self.proj_drop(x)
        return x
mindocr.models.backbones.mindcv_models.pvt.Block

Bases: nn.Cell

Block with spatial-reduction attention (SRA) and feed forward

Source code in mindocr\models\backbones\mindcv_models\pvt.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
class Block(nn.Cell):
    """ Block with spatial-reduction attention (SRA) and feed forward"""
    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, sr_ratio=1):
        super(Block, self).__init__()
        self.norm1 = norm_layer([dim], epsilon=1e-5)
        self.attn = Attention(
            dim,
            num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
            attn_drop=attn_drop, proj_drop=drop, sr_ratio=sr_ratio)
        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
        self.norm2 = norm_layer([dim])
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

    def construct(self, x, H, W):
        x1 = self.norm1(x)
        x1 = self.attn(x1, H, W)
        x = x + self.drop_path(x1)
        x = x + self.drop_path(self.mlp(self.norm2(x)))

        return x
mindocr.models.backbones.mindcv_models.pvt.PatchEmbed

Bases: nn.Cell

Image to Patch Embedding

Source code in mindocr\models\backbones\mindcv_models\pvt.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
class PatchEmbed(nn.Cell):
    """Image to Patch Embedding"""

    def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768):
        super().__init__()

        img_size = (img_size, img_size)
        patch_size = (patch_size, patch_size)

        self.img_size = img_size
        self.patch_size = patch_size

        self.H, self.W = img_size[0] // patch_size[0], img_size[1] // patch_size[1]
        self.num_patches = self.H * self.W
        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size, has_bias=True)
        self.norm = nn.LayerNorm([embed_dim], epsilon=1e-5)
        self.reshape = ops.reshape
        self.transpose = ops.transpose

    def construct(self, x):
        B, C, H, W = x.shape

        x = self.proj(x)
        b, c, h, w = x.shape
        x = self.reshape(x, (b, c, h * w))
        x = self.transpose(x, (0, 2, 1))
        x = self.norm(x)
        H, W = H // self.patch_size[0], W // self.patch_size[1]

        return x, (H, W)
mindocr.models.backbones.mindcv_models.pvt.PyramidVisionTransformer

Bases: nn.Cell

Pyramid Vision Transformer model class, based on "Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions" <https://arxiv.org/abs/2102.12122>_ # noqa: E501

PARAMETER DESCRIPTION
img_size(int)

size of a input image.

patch_size

size of a single image patch.

TYPE: int) DEFAULT: 4

in_chans

number the channels of the input. Default: 3.

TYPE: int) DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

embed_dims

how many hidden dim in each PatchEmbed.

TYPE: list) DEFAULT: [64, 128, 320, 512]

num_heads

number of attention head in each stage.

TYPE: list) DEFAULT: [1, 2, 5, 8]

mlp_ratios

ratios of MLP hidden dims in each stage.

TYPE: list DEFAULT: [8, 8, 4, 4]

qkv_bias(bool)

use bias in attention.

qk_scale(float)

Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.

drop_rate(float)

The drop rate for each block. Default: 0.0.

attn_drop_rate(float)

The drop rate for attention. Default: 0.0.

drop_path_rate(float)

The drop rate for drop path. Default: 0.0.

norm_layer(nn.Cell)

Norm layer that will be used in blocks. Default: nn.LayerNorm.

depths

number of Blocks.

TYPE: list) DEFAULT: [2, 2, 2, 2]

sr_ratios(list)

stride and kernel size of each attention.

num_stages(int)

number of stage. Default: 4.

Source code in mindocr\models\backbones\mindcv_models\pvt.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
class PyramidVisionTransformer(nn.Cell):
    r"""Pyramid Vision Transformer model class, based on
    `"Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions" <https://arxiv.org/abs/2102.12122>`_  # noqa: E501

    Args:
        img_size(int) : size of a input image.
        patch_size (int) : size of a single image patch.
        in_chans (int) : number the channels of the input. Default: 3.
        num_classes (int) : number of classification classes. Default: 1000.
        embed_dims (list) : how many hidden dim in each PatchEmbed.
        num_heads (list) : number of attention head in each stage.
        mlp_ratios (list): ratios of MLP hidden dims in each stage.
        qkv_bias(bool) : use bias in attention.
        qk_scale(float) : Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
        drop_rate(float) : The drop rate for each block. Default: 0.0.
        attn_drop_rate(float) : The drop rate for attention. Default: 0.0.
        drop_path_rate(float) : The drop rate for drop path. Default: 0.0.
        norm_layer(nn.Cell) : Norm layer that will be used in blocks. Default: nn.LayerNorm.
        depths (list) : number of Blocks.
        sr_ratios(list) : stride and kernel size of each attention.
        num_stages(int) : number of stage. Default: 4.
    """

    def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dims=[64, 128, 320, 512],
                 num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0.0,
                 attn_drop_rate=0.0, drop_path_rate=0.0, norm_layer=nn.LayerNorm,
                 depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1], num_stages=4):
        super(PyramidVisionTransformer, self).__init__()
        self.num_classes = num_classes
        self.depths = depths
        self.num_stages = num_stages
        start = Tensor(0, mindspore.float32)
        stop = Tensor(drop_path_rate, mindspore.float32)
        dpr = [float(x) for x in ops.linspace(start, stop, sum(depths))]  # stochastic depth decay rule
        cur = 0
        b_list = []
        self.pos_embed = []
        self.pos_drop = nn.Dropout(1 - drop_rate)
        for i in range(num_stages):
            block = nn.CellList(
                [Block(dim=embed_dims[i], num_heads=num_heads[i], mlp_ratio=mlp_ratios[i], qkv_bias=qkv_bias,
                       qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + j],
                       norm_layer=norm_layer, sr_ratio=sr_ratios[i])
                 for j in range(depths[i])
                 ])

            b_list.append(block)
            cur += depths[0]

        self.patch_embed1 = PatchEmbed(img_size=img_size,
                                       patch_size=patch_size,
                                       in_chans=in_chans,
                                       embed_dim=embed_dims[0])
        num_patches = self.patch_embed1.num_patches
        self.pos_embed1 = mindspore.Parameter(ops.zeros((1, num_patches, embed_dims[0]), mindspore.float16))
        self.pos_drop1 = nn.Dropout(1 - drop_rate)

        self.patch_embed2 = PatchEmbed(img_size=img_size // (2 ** (1 + 1)),
                                       patch_size=2,
                                       in_chans=embed_dims[1 - 1],
                                       embed_dim=embed_dims[1])
        num_patches = self.patch_embed2.num_patches
        self.pos_embed2 = mindspore.Parameter(ops.zeros((1, num_patches, embed_dims[1]), mindspore.float16))
        self.pos_drop2 = nn.Dropout(1 - drop_rate)

        self.patch_embed3 = PatchEmbed(img_size=img_size // (2 ** (2 + 1)),
                                       patch_size=2,
                                       in_chans=embed_dims[2 - 1],
                                       embed_dim=embed_dims[2])
        num_patches = self.patch_embed3.num_patches
        self.pos_embed3 = mindspore.Parameter(ops.zeros((1, num_patches, embed_dims[2]), mindspore.float16))
        self.pos_drop3 = nn.Dropout(1 - drop_rate)

        self.patch_embed4 = PatchEmbed(img_size // (2 ** (3 + 1)),
                                       patch_size=2,
                                       in_chans=embed_dims[3 - 1],
                                       embed_dim=embed_dims[3])
        num_patches = self.patch_embed4.num_patches + 1
        self.pos_embed4 = mindspore.Parameter(ops.zeros((1, num_patches, embed_dims[3]), mindspore.float16))
        self.pos_drop4 = nn.Dropout(1 - drop_rate)
        self.Blocks = nn.CellList(b_list)

        self.norm = norm_layer([embed_dims[3]])

        # cls_token
        self.cls_token = mindspore.Parameter(ops.zeros((1, 1, embed_dims[3]), mindspore.float32))

        # classification head
        self.head = nn.Dense(embed_dims[3], num_classes) if num_classes > 0 else Identity()
        self.reshape = ops.reshape
        self.transpose = ops.transpose
        self.tile = ops.Tile()
        self.Concat = ops.Concat(axis=1)
        self._initialize_weights()

    def _initialize_weights(self):
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(weight_init.initializer(weight_init.TruncatedNormal(sigma=0.02),
                                                             cell.weight.shape, cell.weight.dtype))
                if isinstance(cell, nn.Dense) and cell.bias is not None:
                    cell.bias.set_data(weight_init.initializer(weight_init.Zero(), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(weight_init.initializer(weight_init.One(), cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(weight_init.initializer(weight_init.Zero(), cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Conv2d):
                fan_out = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                fan_out //= cell.group
                cell.weight.set_data(weight_init.initializer(weight_init.Normal(sigma=math.sqrt(2.0 / fan_out)),
                                                             cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(weight_init.initializer(weight_init.Zero(), cell.bias.shape, cell.bias.dtype))

    def get_classifier(self):
        return self.head

    def reset_classifier(self, num_classes, global_pool=""):
        self.num_classes = num_classes
        self.head = nn.Dense(self.embed_dim, num_classes) if num_classes > 0 else Identity()

    def _get_pos_embed(self, pos_embed, ph, pw, H, W):
        if H * W == self.patch_embed1.num_patches:
            return pos_embed
        else:
            ResizeBilinear = nn.ResizeBilinear()

            pos_embed = self.transpose(self.reshape(pos_embed, (1, ph, pw, -1)), (0, 3, 1, 2))
            pos_embed = ResizeBilinear(pos_embed, (H, W))

            pos_embed = self.transpose(self.reshape(pos_embed, (1, -1, H * W)), (0, 2, 1))

            return pos_embed

    def forward_features(self, x):
        B = x.shape[0]

        x, (H, W) = self.patch_embed1(x)
        pos_embed = self.pos_embed1
        x = self.pos_drop1(x + pos_embed)
        for blk in self.Blocks[0]:
            x = blk(x, H, W)
        x = self.transpose(self.reshape(x, (B, H, W, -1)), (0, 3, 1, 2))

        x, (H, W) = self.patch_embed2(x)
        ph, pw = self.patch_embed2.H, self.patch_embed2.W
        pos_embed = self._get_pos_embed(self.pos_embed2, ph, pw, H, W)
        x = self.pos_drop2(x + pos_embed)
        for blk in self.Blocks[1]:
            x = blk(x, H, W)
        x = self.transpose(self.reshape(x, (B, H, W, -1)), (0, 3, 1, 2))

        x, (H, W) = self.patch_embed3(x)
        ph, pw = self.patch_embed3.H, self.patch_embed3.W
        pos_embed = self._get_pos_embed(self.pos_embed3, ph, pw, H, W)
        x = self.pos_drop3(x + pos_embed)
        for blk in self.Blocks[2]:
            x = blk(x, H, W)
        x = self.transpose(self.reshape(x, (B, H, W, -1)), (0, 3, 1, 2))

        x, (H, W) = self.patch_embed4(x)
        cls_tokens = self.tile(self.cls_token, (B, 1, 1))

        x = self.Concat((cls_tokens, x))
        ph, pw = self.patch_embed4.H, self.patch_embed4.W
        pos_embed_ = self._get_pos_embed(self.pos_embed4[:, 1:], ph, pw, H, W)
        pos_embed = self.Concat((self.pos_embed4[:, 0:1], pos_embed_))
        x = self.pos_drop4(x + pos_embed)
        for blk in self.Blocks[3]:
            x = blk(x, H, W)

        x = self.norm(x)

        return x[:, 0]

    def forward_head(self, x: Tensor) -> Tensor:
        return self.head(x)

    def construct(self, x):
        x = self.forward_features(x)
        x = self.forward_head(x)

        return x
mindocr.models.backbones.mindcv_models.pvt.pvt_large(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVT large model Refer to the base class "models.PVT" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvt.py
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
@register_model
def pvt_large(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformer:
    """Get PVT large model
    Refer to the base class "models.PVT" for more details.
    """
    default_cfg = default_cfgs['pvt_large']
    model = PyramidVisionTransformer(in_chans=in_channels, num_classes=num_classes,
                                     patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
                                     mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
                                     norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 8, 27, 3],
                                     sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvt.pvt_medium(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVT medium model Refer to the base class "models.PVT" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvt.py
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
@register_model
def pvt_medium(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformer:
    """Get PVT medium model
    Refer to the base class "models.PVT" for more details.
    """
    default_cfg = default_cfgs['pvt_medium']
    model = PyramidVisionTransformer(in_chans=in_channels, num_classes=num_classes,
                                     patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
                                     mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
                                     norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 4, 18, 3],
                                     sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvt.pvt_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVT small model Refer to the base class "models.PVT" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvt.py
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
@register_model
def pvt_small(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformer:
    """Get PVT small model
    Refer to the base class "models.PVT" for more details.
    """
    default_cfg = default_cfgs['pvt_small']
    model = PyramidVisionTransformer(in_chans=in_channels, num_classes=num_classes,
                                     patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
                                     mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
                                     norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 4, 6, 3],
                                     sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvt.pvt_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVT tiny model Refer to the base class "models.PVT" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvt.py
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
@register_model
def pvt_tiny(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformer:
    """Get PVT tiny model
    Refer to the base class "models.PVT" for more details.
    """
    default_cfg = default_cfgs['pvt_tiny']
    model = PyramidVisionTransformer(in_chans=in_channels, num_classes=num_classes,
                                     patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
                                     mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
                                     norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[2, 2, 2, 2],
                                     sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvtv2

MindSpore implementation of PVTv2. Refer to PVTv2: PVTv2: Improved Baselines with Pyramid Vision Transformer

mindocr.models.backbones.mindcv_models.pvtv2.Attention

Bases: nn.Cell

Linear Spatial Reduction Attention

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
class Attention(nn.Cell):
    """Linear Spatial Reduction Attention"""

    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., sr_ratio=1,
                 linear=False):
        super().__init__()
        assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}."

        self.dim = dim
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = qk_scale or head_dim**-0.5

        self.q = nn.Dense(dim, dim, has_bias=qkv_bias)
        self.kv = nn.Dense(dim, dim * 2, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(1 - attn_drop)
        self.proj = nn.Dense(dim, dim)
        self.proj_drop = nn.Dropout(1 - proj_drop)
        self.qk_batmatmul = ops.BatchMatMul(transpose_b=True)
        self.batmatmul = ops.BatchMatMul()
        self.softmax = nn.Softmax(axis=-1)

        self.linear = linear
        self.sr_ratio = sr_ratio
        if not linear:
            if sr_ratio > 1:
                self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio, has_bias=True)
                self.norm = nn.LayerNorm([dim])

        else:
            self.pool = nn.AdaptiveAvgPool2d(7)
            self.sr = nn.Conv2d(dim, dim, kernel_size=1, stride=1, has_bias=True)
            self.norm = nn.LayerNorm([dim])
            self.act = nn.GELU()

    def construct(self, x, H, W):
        B, N, C = x.shape
        q = self.q(x)
        q = ops.reshape(q, (B, N, self.num_heads, C // self.num_heads))
        q = ops.transpose(q, (0, 2, 1, 3))

        if not self.linear:
            if self.sr_ratio > 1:
                x_ = ops.reshape(ops.transpose(x, (0, 2, 1)), (B, C, H, W))

                x_ = self.sr(x_)
                x_ = ops.transpose(ops.reshape(x_, (B, C, -1)), (0, 2, 1))
                x_ = self.norm(x_)

                kv = self.kv(x_)
                kv = ops.transpose(ops.reshape(kv, (B, -1, 2, self.num_heads, C // self.num_heads)), (2, 0, 3, 1, 4))
            else:
                kv = self.kv(x)
                kv = ops.transpose(ops.reshape(kv, (B, -1, 2, self.num_heads, C // self.num_heads)), (2, 0, 3, 1, 4))

        else:
            x_ = ops.reshape(ops.transpose(x, (0, 2, 1)), (B, C, H, W))
            x_ = self.sr(self.pool(x_))
            x_ = ops.reshape(ops.transpose(x_, (0, 2, 1)), (B, C, -1))
            x_ = self.norm(x_)
            x_ = self.act(x_)
            kv = ops.transpose(ops.reshape(self.kv(x_), (B, -1, 2, self.num_heads, C // self.num_heads)),
                               (2, 0, 3, 1, 4))
        k, v = kv[0], kv[1]

        attn = self.qk_batmatmul(q, k) * self.scale
        attn = self.softmax(attn)
        attn = self.attn_drop(attn)

        x = self.batmatmul(attn, v)
        x = ops.reshape(ops.transpose(x, (0, 2, 1, 3)), (B, N, C))
        x = self.proj(x)
        x = self.proj_drop(x)

        return x
mindocr.models.backbones.mindcv_models.pvtv2.Block

Bases: nn.Cell

Block with Linear Spatial Reduction Attention and Convolutional Feed-Forward

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
class Block(nn.Cell):
    """Block with Linear Spatial Reduction Attention and Convolutional Feed-Forward"""

    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, sr_ratio=1, linear=False, block_id=0):
        super().__init__()
        self.norm1 = norm_layer([dim])

        self.attn = Attention(
            dim,
            num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
            attn_drop=attn_drop, proj_drop=drop, sr_ratio=sr_ratio, linear=linear)

        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()

        self.norm2 = norm_layer([dim])

        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop, linear=linear)

    def construct(self, x, H, W):
        x = x + self.drop_path(self.attn(self.norm1(x), H, W))
        x = x + self.drop_path(self.mlp(self.norm2(x), H, W))

        return x
mindocr.models.backbones.mindcv_models.pvtv2.DWConv

Bases: nn.Cell

Depthwise separable convolution

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class DWConv(nn.Cell):
    """Depthwise separable convolution"""

    def __init__(self, dim=768):
        super(DWConv, self).__init__()
        self.dwconv = nn.Conv2d(dim, dim, 3, 1, has_bias=True, group=dim)

    def construct(self, x, H, W):
        B, N, C = x.shape
        x = ops.transpose(x, (0, 2, 1)).view((B, C, H, W))
        x = self.dwconv(x)
        x = ops.transpose(x.view((B, C, H * W)), (0, 2, 1))

        return x
mindocr.models.backbones.mindcv_models.pvtv2.Mlp

Bases: nn.Cell

MLP with depthwise separable convolution

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class Mlp(nn.Cell):
    """MLP with depthwise separable convolution"""

    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.0, linear=False):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.fc1 = nn.Dense(in_features, hidden_features)
        self.dwconv = DWConv(hidden_features)
        self.act = act_layer()
        self.fc2 = nn.Dense(hidden_features, out_features)
        self.drop = nn.Dropout(1 - drop)
        self.linear = linear
        if self.linear:
            self.relu = nn.ReLU()

    def construct(self, x, H, W):
        x = self.fc1(x)
        if self.linear:
            x = self.relu(x)
        x = self.dwconv(x, H, W)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x
mindocr.models.backbones.mindcv_models.pvtv2.OverlapPatchEmbed

Bases: nn.Cell

Overlapping Patch Embedding

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
class OverlapPatchEmbed(nn.Cell):
    """Overlapping Patch Embedding"""

    def __init__(self, img_size=224, patch_size=7, stride=4, in_chans=3, embed_dim=768):
        super().__init__()

        img_size = (img_size, img_size)
        patch_size = (patch_size, patch_size)

        assert max(patch_size) > stride, "Set larger patch_size than stride"

        self.img_size = img_size
        self.patch_size = patch_size
        self.H, self.W = img_size[0] // stride, img_size[1] // stride
        self.num_patches = self.H * self.W
        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride, has_bias=True)
        self.norm = nn.LayerNorm([embed_dim])

    def construct(self, x):
        x = self.proj(x)
        B, C, H, W = x.shape
        x = ops.transpose(ops.reshape(x, (B, C, H * W)), (0, 2, 1))
        x = self.norm(x)

        return x, H, W
mindocr.models.backbones.mindcv_models.pvtv2.PyramidVisionTransformerV2

Bases: nn.Cell

Pyramid Vision Transformer V2 model class, based on "PVTv2: Improved Baselines with Pyramid Vision Transformer" <https://arxiv.org/abs/2106.13797>_

PARAMETER DESCRIPTION
img_size(int)

size of a input image.

patch_size

size of a single image patch.

TYPE: int) DEFAULT: 16

in_chans

number the channels of the input. Default: 3.

TYPE: int) DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

embed_dims

how many hidden dim in each PatchEmbed.

TYPE: list) DEFAULT: [64, 128, 256, 512]

num_heads

number of attention head in each stage.

TYPE: list) DEFAULT: [1, 2, 4, 8]

mlp_ratios

ratios of MLP hidden dims in each stage.

TYPE: list DEFAULT: [4, 4, 4, 4]

qkv_bias(bool)

use bias in attention.

qk_scale(float)

Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.

drop_rate(float)

The drop rate for each block. Default: 0.0.

attn_drop_rate(float)

The drop rate for attention. Default: 0.0.

drop_path_rate(float)

The drop rate for drop path. Default: 0.0.

norm_layer(nn.Cell)

Norm layer that will be used in blocks. Default: nn.LayerNorm.

depths

number of Blocks.

TYPE: list) DEFAULT: [3, 4, 6, 3]

sr_ratios(list)

stride and kernel size of each attention.

num_stages(int)

number of stage. Default: 4.

linear(bool)

use linear SRA.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
class PyramidVisionTransformerV2(nn.Cell):
    r"""Pyramid Vision Transformer V2 model class, based on
    `"PVTv2: Improved Baselines with Pyramid Vision Transformer" <https://arxiv.org/abs/2106.13797>`_

    Args:
        img_size(int) : size of a input image.
        patch_size (int) : size of a single image patch.
        in_chans (int) : number the channels of the input. Default: 3.
        num_classes (int) : number of classification classes. Default: 1000.
        embed_dims (list) : how many hidden dim in each PatchEmbed.
        num_heads (list) : number of attention head in each stage.
        mlp_ratios (list): ratios of MLP hidden dims in each stage.
        qkv_bias(bool) : use bias in attention.
        qk_scale(float) : Scale multiplied by qk in attention(if not none), otherwise head_dim ** -0.5.
        drop_rate(float) : The drop rate for each block. Default: 0.0.
        attn_drop_rate(float) : The drop rate for attention. Default: 0.0.
        drop_path_rate(float) : The drop rate for drop path. Default: 0.0.
        norm_layer(nn.Cell) : Norm layer that will be used in blocks. Default: nn.LayerNorm.
        depths (list) : number of Blocks.
        sr_ratios(list) : stride and kernel size of each attention.
        num_stages(int) : number of stage. Default: 4.
        linear(bool) :  use linear SRA.
    """

    def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dims=[64, 128, 256, 512],
                 num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, drop_rate=0.,
                 attn_drop_rate=0., drop_path_rate=0., norm_layer=nn.LayerNorm,
                 depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], num_stages=4, linear=False):
        super().__init__()
        self.num_classes = num_classes
        self.depths = depths
        self.num_stages = num_stages

        start = Tensor(0, mindspore.float32)
        stop = Tensor(drop_path_rate, mindspore.float32)
        dpr = [float(x) for x in ops.linspace(start, stop, sum(depths))]  # stochastic depth decay rule
        cur = 0

        patch_embed_list = []
        block_list = []
        norm_list = []

        for i in range(num_stages):
            patch_embed = OverlapPatchEmbed(img_size=img_size if i == 0 else img_size // (2 ** (i + 1)),
                                            patch_size=7 if i == 0 else 3,
                                            stride=4 if i == 0 else 2,
                                            in_chans=in_chans if i == 0 else embed_dims[i - 1],
                                            embed_dim=embed_dims[i])

            block = nn.CellList([Block(
                dim=embed_dims[i], num_heads=num_heads[i], mlp_ratio=mlp_ratios[i], qkv_bias=qkv_bias,
                qk_scale=qk_scale,
                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + j], norm_layer=norm_layer,
                sr_ratio=sr_ratios[i], linear=linear, block_id=j)
                for j in range(depths[i])])

            norm = norm_layer([embed_dims[i]])

            cur += depths[i]

            patch_embed_list.append(patch_embed)
            block_list.append(block)
            norm_list.append(norm)
        self.patch_embed_list = nn.CellList(patch_embed_list)
        self.block_list = nn.CellList(block_list)
        self.norm_list = nn.CellList(norm_list)
        # classification head
        self.head = nn.Dense(embed_dims[3], num_classes) if num_classes > 0 else Identity()
        self._initialize_weights()

    def freeze_patch_emb(self):
        self.patch_embed_list[0].requires_grad = False

    def _initialize_weights(self):
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(weight_init.initializer(weight_init.TruncatedNormal(sigma=0.02),
                                                             cell.weight.shape, cell.weight.dtype))
                if isinstance(cell, nn.Dense) and cell.bias is not None:
                    cell.bias.set_data(weight_init.initializer(weight_init.Zero(), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(weight_init.initializer(weight_init.One(), cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(weight_init.initializer(weight_init.Zero(), cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Conv2d):
                fan_out = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                fan_out //= cell.group
                cell.weight.set_data(weight_init.initializer(weight_init.Normal(sigma=math.sqrt(2.0 / fan_out)),
                                                             cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(weight_init.initializer(weight_init.Zero(), cell.bias.shape, cell.bias.dtype))

    def get_classifier(self):
        return self.head

    def reset_classifier(self, num_classes, global_pool=""):
        self.num_classes = num_classes
        self.head = nn.Dense(self.embed_dim, num_classes) if num_classes > 0 else Identity()

    def forward_features(self, x):
        B = x.shape[0]

        for i in range(self.num_stages):
            patch_embed = self.patch_embed_list[i]
            block = self.block_list[i]
            norm = self.norm_list[i]
            x, H, W = patch_embed(x)
            for blk in block:
                x = blk(x, H, W)
            x = norm(x)
            if i != self.num_stages - 1:
                x = ops.transpose(ops.reshape(x, (B, H, W, -1)), (0, 3, 1, 2))

        return x.mean(axis=1)

    def forward_head(self, x: Tensor) -> Tensor:
        return self.head(x)

    def construct(self, x):
        x = self.forward_features(x)
        x = self.forward_head(x)

        return x
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVTV2-b0 model Refer to the base class "models.PVTv2" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
@register_model
def pvt_v2_b0(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformerV2:
    """Get PVTV2-b0 model
    Refer to the base class "models.PVTv2" for more details.
    """
    default_cfg = default_cfgs["pvt_v2_b0"]
    model = PyramidVisionTransformerV2(
        in_chans=in_channels, num_classes=num_classes,
        patch_size=4, embed_dims=[32, 64, 160, 256], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVTV2-b1 model Refer to the base class "models.PVTv2" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
@register_model
def pvt_v2_b1(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformerV2:
    """Get PVTV2-b1 model
    Refer to the base class "models.PVTv2" for more details.
    """
    default_cfg = default_cfgs["pvt_v2_b1"]
    model = PyramidVisionTransformerV2(
        in_chans=in_channels, num_classes=num_classes,
        patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVTV2-b2 model Refer to the base class "models.PVTv2" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
@register_model
def pvt_v2_b2(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformerV2:
    """Get PVTV2-b2 model
    Refer to the base class "models.PVTv2" for more details.
    """
    default_cfg = default_cfgs["pvt_v2_b2"]
    model = PyramidVisionTransformerV2(
        in_chans=in_channels, num_classes=num_classes,
        patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVTV2-b3 model Refer to the base class "models.PVTv2" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
@register_model
def pvt_v2_b3(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformerV2:
    """Get PVTV2-b3 model
    Refer to the base class "models.PVTv2" for more details.
    """
    default_cfg = default_cfgs["pvt_v2_b3"]
    model = PyramidVisionTransformerV2(
        in_chans=in_channels, num_classes=num_classes,
        patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 4, 18, 3], sr_ratios=[8, 4, 2, 1], **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b4(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVTV2-b4 model Refer to the base class "models.PVTv2" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
@register_model
def pvt_v2_b4(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformerV2:
    """Get PVTV2-b4 model
    Refer to the base class "models.PVTv2" for more details.
    """
    default_cfg = default_cfgs["pvt_v2_b4"]
    model = PyramidVisionTransformerV2(
        in_chans=in_channels, num_classes=num_classes,
        patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 8, 27, 3], sr_ratios=[8, 4, 2, 1], **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.pvtv2.pvt_v2_b5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get PVTV2-b5 model Refer to the base class "models.PVTv2" for more details.

Source code in mindocr\models\backbones\mindcv_models\pvtv2.py
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
@register_model
def pvt_v2_b5(
    pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs
) -> PyramidVisionTransformerV2:
    """Get PVTV2-b5 model
    Refer to the base class "models.PVTv2" for more details.
    """
    default_cfg = default_cfgs["pvt_v2_b5"]
    model = PyramidVisionTransformerV2(
        in_chans=in_channels, num_classes=num_classes,
        patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), depths=[3, 6, 40, 3], sr_ratios=[8, 4, 2, 1], **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.registry

model registry and list

mindocr.models.backbones.mindcv_models.registry.get_pretrained_cfg_value(model_name, cfg_key)

Get a specific model default_cfg value by key. None if it doesn't exist.

Source code in mindocr\models\backbones\mindcv_models\registry.py
128
129
130
131
132
def get_pretrained_cfg_value(model_name, cfg_key):
    """Get a specific model default_cfg value by key. None if it doesn't exist."""
    if model_name in _model_pretrained_cfgs:
        return _model_pretrained_cfgs[model_name].get(cfg_key, None)
    return None
mindocr.models.backbones.mindcv_models.registry.has_pretrained_cfg_key(model_name, cfg_key)

Query model default_cfgs for existence of a specific key.

Source code in mindocr\models\backbones\mindcv_models\registry.py
135
136
137
138
139
def has_pretrained_cfg_key(model_name, cfg_key):
    """Query model default_cfgs for existence of a specific key."""
    if model_name in _model_pretrained_cfgs and cfg_key in _model_pretrained_cfgs[model_name]:
        return True
    return False
mindocr.models.backbones.mindcv_models.registry.is_model(model_name)

Check if a model name exists

Source code in mindocr\models\backbones\mindcv_models\registry.py
85
86
87
88
89
def is_model(model_name):
    """
    Check if a model name exists
    """
    return model_name in _model_entrypoints
mindocr.models.backbones.mindcv_models.registry.is_model_in_modules(model_name, module_names)

Check if a model exists within a subset of modules

Source code in mindocr\models\backbones\mindcv_models\registry.py
107
108
109
110
111
112
113
114
115
def is_model_in_modules(model_name, module_names):
    """
    Check if a model exists within a subset of modules
    Args:
        model_name (str) - name of model to check
        module_names (tuple, list, set) - names of modules to search in
    """
    assert isinstance(module_names, (tuple, list, set))
    return any(model_name in _module_to_models[n] for n in module_names)
mindocr.models.backbones.mindcv_models.registry.list_modules()

Return list of module names that contain models / model entrypoints

Source code in mindocr\models\backbones\mindcv_models\registry.py
 99
100
101
102
103
104
def list_modules():
    """
    Return list of module names that contain models / model entrypoints
    """
    modules = _module_to_models.keys()
    return list(sorted(modules))
mindocr.models.backbones.mindcv_models.registry.model_entrypoint(model_name)

Fetch a model entrypoint for specified model name

Source code in mindocr\models\backbones\mindcv_models\registry.py
92
93
94
95
96
def model_entrypoint(model_name):
    """
    Fetch a model entrypoint for specified model name
    """
    return _model_entrypoints[model_name]
mindocr.models.backbones.mindcv_models.regnet

MindSpore implementation of RegNet. Refer to: Designing Network Design Spaces

mindocr.models.backbones.mindcv_models.regnet.AnyHead

Bases: nn.Cell

AnyNet head: optional conv, AvgPool, 1x1.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
class AnyHead(nn.Cell):
    """AnyNet head: optional conv, AvgPool, 1x1."""

    def __init__(self, w_in, head_width, num_classes):
        super(AnyHead, self).__init__()
        self.head_width = head_width
        if head_width > 0:
            self.conv = conv2d(w_in, head_width, 1)
            self.bn = norm2d(head_width)
            self.af = activation()
            w_in = head_width
        self.avg_pool = gap2d()
        self.fc = linear(w_in, num_classes, bias=True)

    def construct(self, x):
        x = self.af(self.bn(self.conv(x))) if self.head_width > 0 else x
        x = self.avg_pool(x)
        x = self.fc(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.AnyNet

Bases: nn.Cell

AnyNet model.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
class AnyNet(nn.Cell):
    """AnyNet model."""

    @staticmethod
    def anynet_get_params(depths, stem_type, stem_w, block_type, widths, strides, bot_muls, group_ws, head_w,
                          num_classes, se_r):
        nones = [None for _ in depths]
        return {
            "stem_type": stem_type,
            "stem_w": stem_w,
            "block_type": block_type,
            "depths": depths,
            "widths": widths,
            "strides": strides,
            "bot_muls": bot_muls if bot_muls else nones,
            "group_ws": group_ws if group_ws else nones,
            "head_w": head_w,
            "se_r": se_r,
            "num_classes": num_classes,
        }

    def __init__(self, depths, stem_type, stem_w, block_type, widths, strides, bot_muls, group_ws, head_w, num_classes,
                 se_r, in_channels):
        super(AnyNet, self).__init__()
        p = AnyNet.anynet_get_params(depths, stem_type, stem_w, block_type, widths, strides, bot_muls, group_ws, head_w,
                                     num_classes, se_r)
        stem_fun = get_stem_fun(p["stem_type"])
        block_fun = get_block_fun(p["block_type"])
        self.stem = stem_fun(in_channels, p["stem_w"])
        prev_w = p["stem_w"]
        keys = ["depths", "widths", "strides", "bot_muls", "group_ws"]
        self.stages = nn.CellList()
        for i, (d, w, s, b, g) in enumerate(zip(*[p[k] for k in keys])):
            params = {"bot_mul": b, "group_w": g, "se_r": p["se_r"]}
            stage = AnyStage(prev_w, w, s, d, block_fun, params)
            self.stages.append(stage)
            prev_w = w
        self.head = AnyHead(prev_w, p["head_w"], p["num_classes"])
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                fan_out = cell.kernel_size[0] * cell.kernel_size[1] * cell.out_channels
                cell.weight.set_data(
                    init.initializer(init.Normal(sigma=math.sqrt(2.0 / fan_out), mean=0.0),
                                     cell.weight.shape, cell.weight.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Normal(sigma=0.01, mean=0.0), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x):
        x = self.stem(x)
        for module in self.stages:
            x = module(x)
        return x

    def forward_head(self, x):
        x = self.head(x)
        return x

    def construct(self, x):
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.AnyStage

Bases: nn.Cell

AnyNet stage (sequence of blocks w/ the same output shape).

Source code in mindocr\models\backbones\mindcv_models\regnet.py
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
class AnyStage(nn.Cell):
    """AnyNet stage (sequence of blocks w/ the same output shape)."""

    def __init__(self, w_in, w_out, stride, d, block_fun, params):
        super(AnyStage, self).__init__()
        self.blocks = nn.CellList()
        for _ in range(d):
            block = block_fun(w_in, w_out, stride, params)
            self.blocks.append(block)
            stride, w_in = 1, w_out

    def construct(self, x):
        for block in self.blocks:
            x = block(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.BasicTransform

Bases: nn.Cell

Basic transformation: [3x3 conv, BN, Relu] x2.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
class BasicTransform(nn.Cell):
    """Basic transformation: [3x3 conv, BN, Relu] x2."""

    def __init__(self, w_in, w_out, stride, _params):
        super(BasicTransform, self).__init__()
        self.a = conv2d(w_in, w_out, 3, stride=stride)
        self.a_bn = norm2d(w_out)
        self.a_af = activation()
        self.b = conv2d(w_out, w_out, 3)
        self.b_bn = norm2d(w_out)
        self.b_bn.final_bn = True

    def construct(self, x):
        x = self.a(x)
        x = self.a_bn(x)
        x = self.a_af(x)
        x = self.b(x)
        x = self.b_bn(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.BottleneckTransform

Bases: nn.Cell

Bottleneck transformation: 1x1, 3x3 [+SE], 1x1.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
class BottleneckTransform(nn.Cell):
    """Bottleneck transformation: 1x1, 3x3 [+SE], 1x1."""

    def __init__(self, w_in, w_out, stride, params):
        super(BottleneckTransform, self).__init__()
        w_b = int(round(w_out * params["bot_mul"]))
        w_se = int(round(w_in * params["se_r"]))
        groups = w_b // params["group_w"]
        self.a = conv2d(w_in, w_b, 1)
        self.a_bn = norm2d(w_b)
        self.a_af = activation()
        self.b = conv2d(w_b, w_b, 3, stride=stride, groups=groups)
        self.b_bn = norm2d(w_b)
        self.b_af = activation()
        self.se = SqueezeExcite(in_channels=w_b, rd_channels=w_se) if w_se else None
        self.c = conv2d(w_b, w_out, 1)
        self.c_bn = norm2d(w_out)
        self.c_bn.final_bn = True

    def construct(self, x):
        x = self.a(x)
        x = self.a_bn(x)
        x = self.a_af(x)
        x = self.b(x)
        x = self.b_bn(x)
        x = self.b_af(x)
        x = self.se(x) if self.se is not None else x
        x = self.c(x)
        x = self.c_bn(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.RegNet

Bases: AnyNet

RegNet model class, based on "Designing Network Design Spaces" <https://arxiv.org/abs/2003.13678>_

Source code in mindocr\models\backbones\mindcv_models\regnet.py
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
class RegNet(AnyNet):
    r"""RegNet model class, based on
    `"Designing Network Design Spaces" <https://arxiv.org/abs/2003.13678>`_
    """

    @staticmethod
    def regnet_get_params(w_a, w_0, w_m, d, stride, bot_mul, group_w, stem_type, stem_w, block_type, head_w,
                          num_classes, se_r):
        """Get AnyNet parameters that correspond to the RegNet."""
        ws, ds, ss, bs, gs = generate_regnet_full(w_a, w_0, w_m, d, stride, bot_mul, group_w)
        return {
            "stem_type": stem_type,
            "stem_w": stem_w,
            "block_type": block_type,
            "depths": ds,
            "widths": ws,
            "strides": ss,
            "bot_muls": bs,
            "group_ws": gs,
            "head_w": head_w,
            "se_r": se_r,
            "num_classes": num_classes,
        }

    def __init__(self, w_a, w_0, w_m, d, group_w, stride=2, bot_mul=1.0, stem_type="simple_stem_in", stem_w=32,
                 block_type="res_bottleneck_block", head_w=0, num_classes=1000, se_r=0.0, in_channels=3):
        params = RegNet.regnet_get_params(w_a, w_0, w_m, d, stride, bot_mul, group_w, stem_type, stem_w, block_type,
                                          head_w, num_classes, se_r)
        print(params)
        super(RegNet, self).__init__(params["depths"], params["stem_type"], params["stem_w"], params["block_type"],
                                     params["widths"], params["strides"], params["bot_muls"], params["group_ws"],
                                     params["head_w"], params["num_classes"], params["se_r"], in_channels)
mindocr.models.backbones.mindcv_models.regnet.RegNet.regnet_get_params(w_a, w_0, w_m, d, stride, bot_mul, group_w, stem_type, stem_w, block_type, head_w, num_classes, se_r) staticmethod

Get AnyNet parameters that correspond to the RegNet.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
@staticmethod
def regnet_get_params(w_a, w_0, w_m, d, stride, bot_mul, group_w, stem_type, stem_w, block_type, head_w,
                      num_classes, se_r):
    """Get AnyNet parameters that correspond to the RegNet."""
    ws, ds, ss, bs, gs = generate_regnet_full(w_a, w_0, w_m, d, stride, bot_mul, group_w)
    return {
        "stem_type": stem_type,
        "stem_w": stem_w,
        "block_type": block_type,
        "depths": ds,
        "widths": ws,
        "strides": ss,
        "bot_muls": bs,
        "group_ws": gs,
        "head_w": head_w,
        "se_r": se_r,
        "num_classes": num_classes,
    }
mindocr.models.backbones.mindcv_models.regnet.ResBasicBlock

Bases: nn.Cell

Residual basic block: x + f(x), f = basic transform.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
class ResBasicBlock(nn.Cell):
    """Residual basic block: x + f(x), f = basic transform."""

    def __init__(self, w_in, w_out, stride, params):
        super(ResBasicBlock, self).__init__()
        self.proj, self.bn = None, None
        if (w_in != w_out) or (stride != 1):
            self.proj = conv2d(w_in, w_out, 1, stride=stride)
            self.bn = norm2d(w_out)
        self.f = BasicTransform(w_in, w_out, stride, params)
        self.af = activation()

    def construct(self, x):
        x_p = self.bn(self.proj(x)) if self.proj is not None else x
        return self.af(x_p + self.f(x))
mindocr.models.backbones.mindcv_models.regnet.ResBottleneckBlock

Bases: nn.Cell

Residual bottleneck block: x + f(x), f = bottleneck transform.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
class ResBottleneckBlock(nn.Cell):
    """Residual bottleneck block: x + f(x), f = bottleneck transform."""

    def __init__(self, w_in, w_out, stride, params):
        super(ResBottleneckBlock, self).__init__()
        self.proj, self.bn = None, None
        if (w_in != w_out) or (stride != 1):
            self.proj = conv2d(w_in, w_out, 1, stride=stride)
            self.bn = norm2d(w_out)
        self.f = BottleneckTransform(w_in, w_out, stride, params)
        self.af = activation()

    def construct(self, x):
        x_p = self.bn(self.proj(x)) if self.proj is not None else x
        return self.af(x_p + self.f(x))
mindocr.models.backbones.mindcv_models.regnet.ResBottleneckLinearBlock

Bases: nn.Cell

Residual linear bottleneck block: x + f(x), f = bottleneck transform.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
279
280
281
282
283
284
285
286
287
288
class ResBottleneckLinearBlock(nn.Cell):
    """Residual linear bottleneck block: x + f(x), f = bottleneck transform."""

    def __init__(self, w_in, w_out, stride, params):
        super(ResBottleneckLinearBlock, self).__init__()
        self.has_skip = (w_in == w_out) and (stride == 1)
        self.f = BottleneckTransform(w_in, w_out, stride, params)

    def construct(self, x):
        return x + self.f(x) if self.has_skip else self.f(x)
mindocr.models.backbones.mindcv_models.regnet.ResStem

Bases: nn.Cell

ResNet stem for ImageNet: 7x7, BN, AF, MaxPool.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
class ResStem(nn.Cell):
    """ResNet stem for ImageNet: 7x7, BN, AF, MaxPool."""

    def __init__(self, w_in, w_out):
        super(ResStem, self).__init__()
        self.conv = conv2d(w_in, w_out, 7, stride=2)
        self.bn = norm2d(w_out)
        self.af = activation()
        self.pool = pool2d(w_out, 3, stride=2)

    def construct(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.af(x)
        x = self.pool(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.ResStemCifar

Bases: nn.Cell

ResNet stem for CIFAR: 3x3, BN, AF.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
class ResStemCifar(nn.Cell):
    """ResNet stem for CIFAR: 3x3, BN, AF."""

    def __init__(self, w_in, w_out):
        super(ResStemCifar, self).__init__()
        self.conv = conv2d(w_in, w_out, 3)
        self.bn = norm2d(w_out)
        self.af = activation()

    def construct(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.af(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.SimpleStem

Bases: nn.Cell

Simple stem for ImageNet: 3x3, BN, AF.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
class SimpleStem(nn.Cell):
    """Simple stem for ImageNet: 3x3, BN, AF."""

    def __init__(self, w_in, w_out):
        super(SimpleStem, self).__init__()
        self.conv = conv2d(w_in, w_out, 3, stride=2)
        self.bn = norm2d(w_out)
        self.af = activation()

    def construct(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.af(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.VanillaBlock

Bases: nn.Cell

Vanilla block: [3x3 conv, BN, Relu] x2.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
class VanillaBlock(nn.Cell):
    """Vanilla block: [3x3 conv, BN, Relu] x2."""

    def __init__(self, w_in, w_out, stride, _params):
        super(VanillaBlock, self).__init__()
        self.a = conv2d(w_in, w_out, 3, stride=stride)
        self.a_bn = norm2d(w_out)
        self.a_af = activation()
        self.b = conv2d(w_out, w_out, 3)
        self.b_bn = norm2d(w_out)
        self.b_af = activation()

    def construct(self, x):
        x = self.a(x)
        x = self.a_bn(x)
        x = self.a_af(x)
        x = self.b(x)
        x = self.b_bn(x)
        x = self.b_af(x)
        return x
mindocr.models.backbones.mindcv_models.regnet.activation()

Helper for building an activation layer.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
115
116
117
def activation():
    """Helper for building an activation layer."""
    return nn.ReLU()
mindocr.models.backbones.mindcv_models.regnet.adjust_block_compatibility(ws, bs, gs)

Adjusts the compatibility of widths, bottlenecks, and groups.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
427
428
429
430
431
432
433
434
435
436
437
438
def adjust_block_compatibility(ws, bs, gs):
    """Adjusts the compatibility of widths, bottlenecks, and groups."""
    assert len(ws) == len(bs) == len(gs)
    assert all(w > 0 and b > 0 and g > 0 for w, b, g in zip(ws, bs, gs))
    assert all(b < 1 or b % 1 == 0 for b in bs)
    vs = [int(max(1, w * b)) for w, b in zip(ws, bs)]
    gs = [int(min(g, v)) for g, v in zip(gs, vs)]
    ms = [np.lcm(g, int(b)) if b > 1 else g for g, b in zip(gs, bs)]
    vs = [max(m, int(round(v / m) * m)) for v, m in zip(vs, ms)]
    ws = [int(v / b) for v, b in zip(vs, bs)]
    assert all(w * b % g == 0 for w, b, g in zip(ws, bs, gs))
    return ws, bs, gs
mindocr.models.backbones.mindcv_models.regnet.conv2d(w_in, w_out, k, *, stride=1, groups=1, bias=False)

Helper for building a conv2d layer.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
84
85
86
87
88
def conv2d(w_in, w_out, k, *, stride=1, groups=1, bias=False):
    """Helper for building a conv2d layer."""
    assert k % 2 == 1, "Only odd size kernels supported to avoid padding issues."
    s, p, g, b = stride, (k - 1) // 2, groups, bias
    return nn.Conv2d(w_in, w_out, k, stride=s, pad_mode="pad", padding=p, group=g, has_bias=b)
mindocr.models.backbones.mindcv_models.regnet.gap2d(keep_dims=False)

Helper for building a gap2d layer.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
105
106
107
def gap2d(keep_dims=False):
    """Helper for building a gap2d layer."""
    return GlobalAvgPooling(keep_dims)
mindocr.models.backbones.mindcv_models.regnet.generate_regnet(w_a, w_0, w_m, d, q=8)

Generates per stage widths and depths from RegNet parameters.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
def generate_regnet(w_a, w_0, w_m, d, q=8):
    """Generates per stage widths and depths from RegNet parameters."""
    assert w_a >= 0 and w_0 > 0 and w_m > 1 and w_0 % q == 0
    # Generate continuous per-block ws
    ws_cont = np.arange(d) * w_a + w_0
    # Generate quantized per-block ws
    ks = np.round(np.log(ws_cont / w_0) / np.log(w_m))
    ws_all = w_0 * np.power(w_m, ks)
    ws_all = np.round(np.divide(ws_all, q)).astype(int) * q
    # Generate per stage ws and ds (assumes ws_all are sorted)
    ws, ds = np.unique(ws_all, return_counts=True)
    # Compute number of actual stages and total possible stages
    num_stages, total_stages = len(ws), ks.max() + 1
    # Convert numpy arrays to lists and return
    ws, ds, ws_all, ws_cont = (x.tolist() for x in (ws, ds, ws_all, ws_cont))
    return ws, ds, num_stages, total_stages, ws_all, ws_cont
mindocr.models.backbones.mindcv_models.regnet.generate_regnet_full(w_a, w_0, w_m, d, stride, bot_mul, group_w)

Generates per stage ws, ds, gs, bs, and ss from RegNet cfg.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
459
460
461
462
463
464
465
466
def generate_regnet_full(w_a, w_0, w_m, d, stride, bot_mul, group_w):
    """Generates per stage ws, ds, gs, bs, and ss from RegNet cfg."""
    ws, ds = generate_regnet(w_a, w_0, w_m, d)[0:2]
    ss = [stride for _ in ws]
    bs = [bot_mul for _ in ws]
    gs = [group_w for _ in ws]
    ws, bs, gs = adjust_block_compatibility(ws, bs, gs)
    return ws, ds, ss, bs, gs
mindocr.models.backbones.mindcv_models.regnet.get_block_fun(block_type)

Retrieves the block function by name.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
341
342
343
344
345
346
347
348
349
350
351
def get_block_fun(block_type):
    """Retrieves the block function by name."""
    block_funs = {
        "vanilla_block": VanillaBlock,
        "res_basic_block": ResBasicBlock,
        "res_bottleneck_block": ResBottleneckBlock,
        "res_bottleneck_linear_block": ResBottleneckLinearBlock,
    }
    err_str = "Block type '{}' not supported"
    assert block_type in block_funs.keys(), err_str.format(block_type)
    return block_funs[block_type]
mindocr.models.backbones.mindcv_models.regnet.get_stem_fun(stem_type)

Retrieves the stem function by name.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
329
330
331
332
333
334
335
336
337
338
def get_stem_fun(stem_type):
    """Retrieves the stem function by name."""
    stem_funs = {
        "res_stem_cifar": ResStemCifar,
        "res_stem_in": ResStem,
        "simple_stem_in": SimpleStem,
    }
    err_str = "Stem type '{}' not supported"
    assert stem_type in stem_funs.keys(), err_str.format(stem_type)
    return stem_funs[stem_type]
mindocr.models.backbones.mindcv_models.regnet.linear(w_in, w_out, *, bias=False)

Helper for building a linear layer.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
110
111
112
def linear(w_in, w_out, *, bias=False):
    """Helper for building a linear layer."""
    return nn.Dense(w_in, w_out, has_bias=bias)
mindocr.models.backbones.mindcv_models.regnet.norm2d(w_in, eps=1e-05, mom=0.9)

Helper for building a norm2d layer.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
91
92
93
def norm2d(w_in, eps=1e-5, mom=0.9):
    """Helper for building a norm2d layer."""
    return nn.BatchNorm2d(num_features=w_in, eps=eps, momentum=mom)
mindocr.models.backbones.mindcv_models.regnet.pool2d(_w_in, k, *, stride=1)

Helper for building a pool2d layer.

Source code in mindocr\models\backbones\mindcv_models\regnet.py
 96
 97
 98
 99
100
101
102
def pool2d(_w_in, k, *, stride=1):
    """Helper for building a pool2d layer."""
    assert k % 2 == 1, "Only odd size kernels supported to avoid padding issues."
    padding = (k - 1) // 2
    pad2d = nn.Pad(((0, 0), (0, 0), (padding, padding), (padding, padding)), mode="CONSTANT")
    max_pool = nn.MaxPool2d(kernel_size=k, stride=stride, pad_mode="valid")
    return nn.SequentialCell([pad2d, max_pool])
mindocr.models.backbones.mindcv_models.repmlp

MindSpore implementation of RepMLP. Refer to RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality.

mindocr.models.backbones.mindcv_models.repmlp.FFNBlock

Bases: nn.Cell

Common FFN layer

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
class FFNBlock(nn.Cell):
    """Common FFN layer"""

    def __init__(self, in_channels, hidden_channels=None, out_channels=None, act_layer=nn.GELU):
        super().__init__()
        out_features = out_channels or in_channels
        hidden_features = hidden_channels or in_channels
        self.ffn_fc1 = conv_bn(in_channels, hidden_features, 1, 1, 0, has_bias=False)
        self.ffn_fc2 = conv_bn(hidden_features, out_features, 1, 1, 0, has_bias=False)
        self.act = act_layer()

    def construct(self, inputs):
        x = self.ffn_fc1(inputs)
        x = self.act(x)
        x = self.ffn_fc2(x)
        return x
mindocr.models.backbones.mindcv_models.repmlp.GlobalPerceptron

Bases: nn.Cell

GlobalPerceptron Layers provides global information(One of the three components of RepMLPBlock)

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
class GlobalPerceptron(nn.Cell):
    """GlobalPerceptron Layers provides global information(One of the three components of RepMLPBlock)"""

    def __init__(self, input_channels, internal_neurons):
        super(GlobalPerceptron, self).__init__()
        self.fc1 = nn.Conv2d(in_channels=input_channels, out_channels=internal_neurons, kernel_size=(1, 1), stride=1,
                             has_bias=True)
        self.fc2 = nn.Conv2d(in_channels=internal_neurons, out_channels=input_channels, kernel_size=(1, 1), stride=1,
                             has_bias=True)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        self.input_channels = input_channels
        self.shape = ops.Shape()

    def construct(self, x):
        shape = self.shape(x)
        pool = nn.AvgPool2d(kernel_size=(shape[2], shape[3]), stride=1)
        x = pool(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        x = x.view(-1, self.input_channels, 1, 1)
        return x
mindocr.models.backbones.mindcv_models.repmlp.RepMLPBlock

Bases: nn.Cell

Basic RepMLPBlock Layer(compose of Global Perceptron, Channel Perceptron and Local Perceptron)

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
class RepMLPBlock(nn.Cell):
    """Basic RepMLPBlock Layer(compose of Global Perceptron, Channel Perceptron and Local Perceptron)"""

    def __init__(self, in_channels, out_channels,
                 h, w,
                 reparam_conv_k=None,
                 globalperceptron_reduce=4,
                 num_sharesets=1,
                 deploy=False):
        super().__init__()

        self.C = in_channels  # noqa: E741
        self.O = out_channels  # noqa: E741
        self.S = num_sharesets  # noqa: E741

        self.h, self.w = h, w

        self.deploy = deploy
        self.transpose = ops.Transpose()
        self.shape = ops.Shape()
        self.reshape = ops.Reshape()

        assert in_channels == out_channels
        self.gp = GlobalPerceptron(input_channels=in_channels, internal_neurons=in_channels // globalperceptron_reduce)

        self.fc3 = nn.Conv2d(in_channels=self.h * self.w * num_sharesets, out_channels=self.h * self.w * num_sharesets,
                             kernel_size=(1, 1), stride=1, padding=0, has_bias=deploy, group=num_sharesets)
        if deploy:
            self.fc3_bn = ops.Identity()
        else:
            self.fc3_bn = nn.BatchNorm2d(num_sharesets).set_train()

        self.reparam_conv_k = reparam_conv_k
        self.conv_branch_k = []
        if not deploy and reparam_conv_k is not None:
            for k in reparam_conv_k:
                conv_branch = conv_bn(num_sharesets, num_sharesets, kernel_size=k, stride=1, padding=k // 2,
                                      group=num_sharesets, has_bias=False)
                self.__setattr__("repconv{}".format(k), conv_branch)
                self.conv_branch_k.append(conv_branch)
                # print(conv_branch)

    def partition(self, x, h_parts, w_parts):
        x = x.reshape(-1, self.C, h_parts, self.h, w_parts, self.w)
        input_perm = (0, 2, 4, 1, 3, 5)
        x = self.transpose(x, input_perm)
        return x

    def partition_affine(self, x, h_parts, w_parts):
        fc_inputs = x.reshape(-1, self.S * self.h * self.w, 1, 1)
        out = self.fc3(fc_inputs)
        out = out.reshape(-1, self.S, self.h, self.w)
        out = self.fc3_bn(out)
        out = out.reshape(-1, h_parts, w_parts, self.S, self.h, self.w)
        return out

    def construct(self, inputs):
        # Global Perceptron
        global_vec = self.gp(inputs)

        origin_shape = self.shape(inputs)

        h_parts = origin_shape[2] // self.h
        w_parts = origin_shape[3] // self.w

        partitions = self.partition(inputs, h_parts, w_parts)

        #   Channel Perceptron
        fc3_out = self.partition_affine(partitions, h_parts, w_parts)

        #   Local Perceptron
        if self.reparam_conv_k is not None and not self.deploy:
            conv_inputs = self.reshape(partitions, (-1, self.S, self.h, self.w))
            conv_out = 0
            for k in self.conv_branch_k:
                conv_out += k(conv_inputs)
            conv_out = self.reshape(conv_out, (-1, h_parts, w_parts, self.S, self.h, self.w))
            fc3_out += conv_out

        input_perm = (0, 3, 1, 4, 2, 5)
        fc3_out = self.transpose(fc3_out, input_perm)  # N, O, h_parts, out_h, w_parts, out_w
        out = fc3_out.reshape(*origin_shape)
        out = out * global_vec
        return out

    def get_equivalent_fc3(self):
        fc_weight, fc_bias = fuse_bn(self.fc3, self.fc3_bn)
        if self.reparam_conv_k is not None:
            largest_k = max(self.reparam_conv_k)
            largest_branch = self.__getattr__("repconv{}".format(largest_k))
            total_kernel, total_bias = fuse_bn(largest_branch.conv, largest_branch.bn)
            for k in self.reparam_conv_k:
                if k != largest_k:
                    k_branch = self.__getattr__("repconv{}".format(k))
                    kernel, bias = fuse_bn(k_branch.conv, k_branch.bn)
                    total_kernel += nn.Pad(kernel, [(largest_k - k) // 2] * 4)
                    total_bias += bias
            rep_weight, rep_bias = self._convert_conv_to_fc(total_kernel, total_bias)
            final_fc3_weight = rep_weight.reshape_as(fc_weight) + fc_weight
            final_fc3_bias = rep_bias + fc_bias
        else:
            final_fc3_weight = fc_weight
            final_fc3_bias = fc_bias
        return final_fc3_weight, final_fc3_bias

    def local_inject(self):
        self.deploy = True
        #   Locality Injection
        fc3_weight, fc3_bias = self.get_equivalent_fc3()
        #   Remove Local Perceptron
        if self.reparam_conv_k is not None:
            for k in self.reparam_conv_k:
                self.__delattr__("repconv{}".format(k))
        self.__delattr__("fc3")
        self.__delattr__("fc3_bn")
        self.fc3 = nn.Conv2d(self.S * self.h * self.w, self.S * self.h * self.w, 1, 1, 0, has_bias=True, group=self.S)
        self.fc3_bn = ops.Identity()
        self.fc3.weight.data = fc3_weight
        self.fc3.bias.data = fc3_bias

    def _convert_conv_to_fc(self, conv_kernel, conv_bias):
        I = ops.eye(self.h * self.w).repeat(1, self.S).reshape(self.h * self.w, self.S, self.h, self.w)  # noqa: E741
        fc_k = ops.Conv2D(I, conv_kernel, pad=(conv_kernel.size(2) // 2, conv_kernel.size(3) // 2), group=self.S)
        fc_k = fc_k.reshape(self.h * self.w, self.S * self.h * self.w).t()
        fc_bias = conv_bias.repeat_interleave(self.h * self.w)
        return fc_k, fc_bias
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet

Bases: nn.Cell

RepMLPNet model class, based on "RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality" <https://arxiv.org/pdf/2112.11081v2.pdf>_

PARAMETER DESCRIPTION
in_channels

number of input channels. Default: 3.

DEFAULT: 3

num_classes

number of classification classes. Default: 1000.

patch_size

size of a single image patch. Default: (4, 4)

DEFAULT: (4, 4)

num_blocks

number of blocks per stage. Default: (2,2,6,2)

DEFAULT: (2, 2, 6, 2)

channels

number of in_channels(channels[stage_idx]) and out_channels(channels[stage_idx + 1]) per stage. Default: (192,384,768,1536)

DEFAULT: (192, 384, 768, 1536)

hs

height of picture per stage. Default: (64,32,16,8)

DEFAULT: (64, 32, 16, 8)

ws

width of picture per stage. Default: (64,32,16,8)

DEFAULT: (64, 32, 16, 8)

sharesets_nums

number of share sets per stage. Default: (4,8,16,32)

DEFAULT: (4, 8, 16, 32)

reparam_conv_k

convolution kernel size in local Perceptron. Default: (3,)

DEFAULT: (3)

globalperceptron_reduce

Intermediate convolution output size (in_channal = inchannal, out_channel = in_channel/globalperceptron_reduce) in globalperceptron. Default: 4

DEFAULT: 4

use_checkpoint

whether to use checkpoint

DEFAULT: False

deploy

whether to use bias

DEFAULT: False

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
class RepMLPNet(nn.Cell):
    r"""RepMLPNet model class, based on
    `"RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality" <https://arxiv.org/pdf/2112.11081v2.pdf>`_

    Args:
        in_channels: number of input channels. Default: 3.
        num_classes: number of classification classes. Default: 1000.
        patch_size: size of a single image patch. Default: (4, 4)
        num_blocks: number of blocks per stage. Default: (2,2,6,2)
        channels: number of in_channels(channels[stage_idx]) and out_channels(channels[stage_idx + 1]) per stage.
            Default: (192,384,768,1536)
        hs: height of picture per stage. Default: (64,32,16,8)
        ws: width of picture per stage. Default: (64,32,16,8)
        sharesets_nums: number of share sets per stage. Default: (4,8,16,32)
        reparam_conv_k: convolution kernel size in local Perceptron. Default: (3,)
        globalperceptron_reduce: Intermediate convolution output size
            (in_channal = inchannal, out_channel = in_channel/globalperceptron_reduce) in globalperceptron. Default: 4
        use_checkpoint: whether to use checkpoint
        deploy: whether to use bias
    """

    def __init__(self,
                 in_channels=3, num_class=1000,
                 patch_size=(4, 4),
                 num_blocks=(2, 2, 6, 2), channels=(192, 384, 768, 1536),
                 hs=(64, 32, 16, 8), ws=(64, 32, 16, 8),
                 sharesets_nums=(4, 8, 16, 32),
                 reparam_conv_k=(3,),
                 globalperceptron_reduce=4, use_checkpoint=False,
                 deploy=False):
        super().__init__()
        num_stages = len(num_blocks)
        assert num_stages == len(channels)
        assert num_stages == len(hs)
        assert num_stages == len(ws)
        assert num_stages == len(sharesets_nums)

        self.conv_embedding = conv_bn_relu(in_channels, channels[0], kernel_size=patch_size, stride=patch_size,
                                           padding=0, has_bias=False)
        self.conv2d = nn.Conv2d(in_channels, channels[0], kernel_size=patch_size, stride=patch_size, padding=0)

        stages = []
        embeds = []
        for stage_idx in range(num_stages):
            stage_blocks = [RepMLPNetUnit(channels=channels[stage_idx], h=hs[stage_idx], w=ws[stage_idx],
                                          reparam_conv_k=reparam_conv_k,
                                          globalperceptron_reduce=globalperceptron_reduce, ffn_expand=4,
                                          num_sharesets=sharesets_nums[stage_idx],
                                          deploy=deploy) for _ in range(num_blocks[stage_idx])]
            stages.append(nn.CellList(stage_blocks))
            if stage_idx < num_stages - 1:
                embeds.append(
                    conv_bn_relu(in_channels=channels[stage_idx], out_channels=channels[stage_idx + 1], kernel_size=2,
                                 stride=2, padding=0))
        self.stages = nn.CellList(stages)
        self.embeds = nn.CellList(embeds)
        self.head_norm = nn.BatchNorm2d(channels[-1]).set_train()
        self.head = nn.Dense(channels[-1], num_class)

        self.use_checkpoint = use_checkpoint
        self.shape = ops.Shape()
        self.reshape = ops.Reshape()
        self._initialize_weights()

    def _initialize_weights(self):
        """Initialize weights for cells."""
        for name, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                k = cell.group / (cell.in_channels * cell.kernel_size[0] * cell.kernel_size[1])
                k = k**0.5
                cell.weight.set_data(init.initializer(init.Uniform(k), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Uniform(k), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.Dense):
                k = 1 / cell.in_channels
                k = k**0.5
                cell.weight.set_data(init.initializer(init.Uniform(k), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Uniform(k), cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.conv_embedding(x)

        for i, stage in enumerate(self.stages):
            for block in stage:
                x = block(x)

            if i < len(self.stages) - 1:
                embed = self.embeds[i]
                x = embed(x)
        x = self.head_norm(x)
        shape = self.shape(x)
        pool = nn.AvgPool2d(kernel_size=(shape[2], shape[3]))
        x = pool(x)
        return x.view(shape[0], -1)

    def forward_head(self, x: Tensor) -> Tensor:
        return self.head(x)

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        return self.forward_head(x)
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNetUnit

Bases: nn.Cell

Basic unit of RepMLPNet

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
class RepMLPNetUnit(nn.Cell):
    """Basic unit of RepMLPNet"""

    def __init__(self, channels, h, w, reparam_conv_k, globalperceptron_reduce, ffn_expand=4,
                 num_sharesets=1, deploy=False):
        super().__init__()
        self.repmlp_block = RepMLPBlock(in_channels=channels, out_channels=channels, h=h, w=w,
                                        reparam_conv_k=reparam_conv_k, globalperceptron_reduce=globalperceptron_reduce,
                                        num_sharesets=num_sharesets, deploy=deploy)
        self.ffn_block = FFNBlock(channels, channels * ffn_expand)
        self.prebn1 = nn.BatchNorm2d(channels).set_train()
        self.prebn2 = nn.BatchNorm2d(channels).set_train()

    def construct(self, x):
        y = x + self.repmlp_block(self.prebn1(x))
        # print(y)
        z = y + self.ffn_block(self.prebn2(y))
        return z
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_B224(pretrained=False, image_size=224, num_classes=1000, in_channels=3, deploy=False, **kwargs)

Get RepMLPNet_B224 model. Refer to the base class models.RepMLPNet for more details.

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
419
420
421
422
423
424
425
426
427
428
429
430
431
432
@register_model
def RepMLPNet_B224(pretrained: bool = False, image_size: int = 224, num_classes: int = 1000, in_channels=3,
                   deploy=False, **kwargs):
    """Get RepMLPNet_B224 model.
    Refer to the base class `models.RepMLPNet` for more details."""
    default_cfg = default_cfgs["RepMLPNet_B224"]
    model = RepMLPNet(in_channels=in_channels, num_class=num_classes, channels=(96, 192, 384, 768), hs=(56, 28, 14, 7),
                      ws=(56, 28, 14, 7),
                      num_blocks=(2, 2, 12, 2), reparam_conv_k=(1, 3), sharesets_nums=(1, 4, 32, 128),
                      deploy=deploy)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_B256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)

Get RepMLPNet_B256 model. Refer to the base class models.RepMLPNet for more details.

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
435
436
437
438
439
440
441
442
443
444
445
446
447
448
@register_model
def RepMLPNet_B256(pretrained: bool = False, image_size: int = 256, num_classes: int = 1000, in_channels=3,
                   deploy=False, **kwargs):
    """Get RepMLPNet_B256 model.
    Refer to the base class `models.RepMLPNet` for more details."""
    default_cfg = default_cfgs["RepMLPNet_B256"]
    model = RepMLPNet(in_channels=in_channels, num_class=num_classes, channels=(96, 192, 384, 768), hs=(64, 32, 16, 8),
                      ws=(64, 32, 16, 8),
                      num_blocks=(2, 2, 12, 2), reparam_conv_k=(1, 3), sharesets_nums=(1, 4, 32, 128),
                      deploy=deploy)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_D256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)

Get RepMLPNet_D256 model. Refer to the base class models.RepMLPNet for more details.

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
451
452
453
454
455
456
457
458
459
460
461
462
463
464
@register_model
def RepMLPNet_D256(pretrained: bool = False, image_size: int = 256, num_classes: int = 1000, in_channels=3,
                   deploy=False, **kwargs):
    """Get RepMLPNet_D256 model.
    Refer to the base class `models.RepMLPNet` for more details."""
    default_cfg = default_cfgs["RepMLPNet_D256"]
    model = RepMLPNet(in_channels=in_channels, num_class=num_classes, channels=(80, 160, 320, 640), hs=(64, 32, 16, 8),
                      ws=(64, 32, 16, 8),
                      num_blocks=(2, 2, 18, 2), reparam_conv_k=(1, 3), sharesets_nums=(1, 4, 16, 128),
                      deploy=deploy)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_L256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)

Get RepMLPNet_L256 model. Refer to the base class models.RepMLPNet for more details.

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
467
468
469
470
471
472
473
474
475
476
477
478
479
480
@register_model
def RepMLPNet_L256(pretrained: bool = False, image_size: int = 256, num_classes: int = 1000, in_channels=3,
                   deploy=False, **kwargs):
    """Get RepMLPNet_L256 model.
    Refer to the base class `models.RepMLPNet` for more details."""
    default_cfg = default_cfgs["RepMLPNet_L256"]
    model = RepMLPNet(in_channels=in_channels, num_class=num_classes, channels=(96, 192, 384, 768), hs=(64, 32, 16, 8),
                      ws=(64, 32, 16, 8),
                      num_blocks=(2, 2, 18, 2), reparam_conv_k=(1, 3), sharesets_nums=(1, 4, 32, 256),
                      deploy=deploy)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_T224(pretrained=False, image_size=224, num_classes=1000, in_channels=3, deploy=False, **kwargs)

Get RepMLPNet_T224 model. Refer to the base class models.RepMLPNet for more details.

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
@register_model
def RepMLPNet_T224(pretrained: bool = False, image_size: int = 224, num_classes: int = 1000, in_channels=3,
                   deploy=False, **kwargs):
    """Get RepMLPNet_T224 model.
    Refer to the base class `models.RepMLPNet` for more details."""
    default_cfg = default_cfgs["RepMLPNet_T224"]
    model = RepMLPNet(in_channels=in_channels, num_class=num_classes, channels=(64, 128, 256, 512), hs=(56, 28, 14, 7),
                      ws=(56, 28, 14, 7),
                      num_blocks=(2, 2, 6, 2), reparam_conv_k=(1, 3), sharesets_nums=(1, 4, 16, 128),
                      deploy=deploy)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repmlp.RepMLPNet_T256(pretrained=False, image_size=256, num_classes=1000, in_channels=3, deploy=False, **kwargs)

Get RepMLPNet_T256 model. Refer to the base class models.RepMLPNet for more details.

Source code in mindocr\models\backbones\mindcv_models\repmlp.py
403
404
405
406
407
408
409
410
411
412
413
414
415
416
@register_model
def RepMLPNet_T256(pretrained: bool = False, image_size: int = 256, num_classes: int = 1000, in_channels=3,
                   deploy=False, **kwargs):
    """Get RepMLPNet_T256 model.
    Refer to the base class `models.RepMLPNet` for more details."""
    default_cfg = default_cfgs["RepMLPNet_T256"]
    model = RepMLPNet(in_channels=in_channels, num_class=num_classes, channels=(64, 128, 256, 512), hs=(64, 32, 16, 8),
                      ws=(64, 32, 16, 8),
                      num_blocks=(2, 2, 6, 2), reparam_conv_k=(1, 3), sharesets_nums=(1, 4, 16, 128),
                      deploy=deploy)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg

MindSpore implementation of RepVGG. Refer to RepVGG: Making VGG_style ConvNets Great Again

mindocr.models.backbones.mindcv_models.repvgg.RepVGG

Bases: nn.Cell

RepVGG model class, based on "RepVGGBlock: An all-MLP Architecture for Vision" <https://arxiv.org/pdf/2101.03697>_

PARAMETER DESCRIPTION
num_blocks

number of RepVGGBlocks

TYPE: list)

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

in_channels

number the channels of the input. Default: 3.

TYPE: in_channels) DEFAULT: 3

width_multiplier

the numbers of MLP Architecture.

TYPE: list) DEFAULT: None

override_group_map

the numbers of MLP Architecture.

TYPE: dict) DEFAULT: None

deploy

use rbr_reparam block or not. Default: False

TYPE: bool) DEFAULT: False

use_se

use se_block or not. Default: False

TYPE: bool) DEFAULT: False

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
class RepVGG(nn.Cell):
    r"""RepVGG model class, based on
    `"RepVGGBlock: An all-MLP Architecture for Vision" <https://arxiv.org/pdf/2101.03697>`_

    Args:
        num_blocks (list) : number of RepVGGBlocks
        num_classes (int) : number of classification classes. Default: 1000.
        in_channels (in_channels) : number the channels of the input. Default: 3.
        width_multiplier (list) : the numbers of MLP Architecture.
        override_group_map (dict) : the numbers of MLP Architecture.
        deploy (bool) : use rbr_reparam block or not. Default: False
        use_se (bool) : use se_block or not. Default: False
    """

    def __init__(self, num_blocks, num_classes=1000, in_channels=3, width_multiplier=None, override_group_map=None,
                 deploy=False, use_se=False):
        super().__init__()

        assert len(width_multiplier) == 4

        self.deploy = deploy
        self.override_group_map = override_group_map or {}
        self.use_se = use_se

        assert 0 not in self.override_group_map

        self.in_planes = min(64, int(64 * width_multiplier[0]))

        self.stage0 = RepVGGBlock(in_channels=in_channels, out_channels=self.in_planes, kernel_size=3, stride=2,
                                  padding=1,
                                  deploy=self.deploy, use_se=self.use_se)
        self.cur_layer_idx = 1
        self.stage1 = self._make_stage(
            int(64 * width_multiplier[0]), num_blocks[0], stride=2)
        self.stage2 = self._make_stage(
            int(128 * width_multiplier[1]), num_blocks[1], stride=2)
        self.stage3 = self._make_stage(
            int(256 * width_multiplier[2]), num_blocks[2], stride=2)
        self.stage4 = self._make_stage(
            int(512 * width_multiplier[3]), num_blocks[3], stride=2)
        self.gap = GlobalAvgPooling()
        self.linear = nn.Dense(int(512 * width_multiplier[3]), num_classes)
        self._initialize_weights()

    def _make_stage(self, planes, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        blocks = []
        for s in strides:
            cur_group = self.override_group_map.get(self.cur_layer_idx, 1)
            blocks.append(RepVGGBlock(in_channels=self.in_planes, out_channels=planes, kernel_size=3,
                                      stride=s, padding=1, group=cur_group, deploy=self.deploy,
                                      use_se=self.use_se))
            self.in_planes = planes
            self.cur_layer_idx += 1
        return nn.SequentialCell(blocks)

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(mode='fan_out', nonlinearity='relu'),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer('zeros', cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer('ones', cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer('zeros', cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(mode='fan_in', nonlinearity='sigmoid'),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer('zeros', cell.bias.shape, cell.bias.dtype))

    def construct(self, x):
        x = self.stage0(x)
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.gap(x)
        x = self.linear(x)
        return x
mindocr.models.backbones.mindcv_models.repvgg.RepVGGBlock

Bases: nn.Cell

Basic Block of RepVGG

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
class RepVGGBlock(nn.Cell):
    """Basic Block of RepVGG"""
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int,
                 stride: int = 1, padding: int = 0, dilation: int = 1,
                 group: int = 1, padding_mode: str = "zeros",
                 deploy: bool = False, use_se: bool = False) -> None:
        super().__init__()
        self.deploy = deploy
        self.group = group
        self.in_channels = in_channels

        assert kernel_size == 3
        assert padding == 1

        padding_11 = padding - kernel_size // 2

        self.nonlinearity = nn.ReLU()

        if use_se:
            self.se = SqueezeExcite(
                in_channels=out_channels, rd_channels=out_channels // 16)
        else:
            self.se = Identity()

        if deploy:
            self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
                                         stride=stride, padding=padding, dilation=dilation, group=group, has_bias=True,
                                         pad_mode=padding_mode)
        else:
            self.rbr_reparam = None
            self.rbr_identity = nn.BatchNorm2d(
                num_features=in_channels) if out_channels == in_channels and stride == 1 else None

            self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
                                     stride=stride, padding=padding, group=group)
            self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride,
                                   padding=padding_11, group=group)

    def construct(self, inputs: Tensor) -> Tensor:
        if self.rbr_reparam is not None:
            return self.nonlinearity(self.se(self.rbr_reparam(inputs)))

        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(inputs)

        return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))

    def get_custom_l2(self):
        """This may improve the accuracy and facilitates quantization in some cases."""
        k3 = self.rbr_dense.conv.weight
        k1 = self.rbr_1x1.conv.weight

        t3 = self.rbr_dense.bn.weight / (
            ops.sqrt((self.rbr_dense.bn.moving_variance + self.rbr_dense.bn.eps)))
        t3 = ops.reshape(t3, (-1, 1, 1, 1))

        t1 = (self.rbr_1x1.bn.weight /
              ((self.rbr_1x1.bn.moving_variance + self.rbr_1x1.bn.eps).sqrt()))
        t1 = ops.reshape(t1, (-1, 1, 1, 1))

        l2_loss_circle = ops.reduce_sum(k3**2) - ops.reduce_sum(k3[:, :, 1:2, 1:2] ** 2)
        eq_kernel = k3[:, :, 1:2, 1:2] * t3 + k1 * t1
        l2_loss_eq_kernel = ops.reduce_sum(eq_kernel**2 / (t3**2 + t1**2))
        return l2_loss_eq_kernel + l2_loss_circle

    #   This func derives the equivalent kernel and bias in a DIFFERENTIABLE way.
    #   You can get the equivalent kernel and bias at any time and do whatever you want,
    #   for example, apply some penalties or constraints during training, just like you do to the other models.
    #   May be useful for quantization or pruning.
    def get_equivalent_kernel_bias(self):
        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid

    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
        if kernel1x1 is None:
            return 0
        return ops.pad(kernel1x1, ((1, 1), (1, 1)))

    def _fuse_bn_tensor(self, branch):
        if branch is None:
            return 0, 0
        if isinstance(branch, nn.SequentialCell):
            kernel = branch.conv.weight
            moving_mean = branch.bn.moving_mean
            moving_variance = branch.bn.moving_variance
            gamma = branch.bn.gamma
            beta = branch.bn.beta
            eps = branch.bn.eps
        else:
            assert isinstance(branch, (nn.BatchNorm2d, nn.SyncBatchNorm))
            if not hasattr(self, "id_tensor"):
                input_dim = self.in_channels // self.group
                kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
                for i in range(self.in_channels):
                    kernel_value[i, i % input_dim, 1, 1] = 1
                self.id_tensor = Tensor(kernel_value, dtype=branch.weight.dtype)
            kernel = self.id_tensor
            moving_mean = branch.moving_mean
            moving_variance = branch.moving_variance
            gamma = branch.gamma
            beta = branch.beta
            eps = branch.eps
        std = ops.sqrt(moving_variance + eps)
        t = ops.reshape(gamma / std, (-1, 1, 1, 1))
        return kernel * t, beta - moving_mean * gamma / std

    def switch_to_deploy(self):
        """Model_convert"""
        if self.rbr_reparam is not None:
            return
        kernel, bias = self.get_equivalent_kernel_bias()
        self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels,
                                     out_channels=self.rbr_dense.conv.out_channels,
                                     kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
                                     padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation,
                                     group=self.rbr_dense.conv.group, has_bias=True, pad_mode="pad")
        self.rbr_reparam.weight.data = kernel
        self.rbr_reparam.bias.data = bias
        for para in self.parameters():
            para.detach_()
        self.__delattr__("rbr_dense")
        self.__delattr__("rbr_1x1")
        if hasattr(self, "rbr_identity"):
            self.__delattr__("rbr_identity")
        if hasattr(self, "id_tensor"):
            self.__delattr__("id_tensor")
        self.deploy = True
mindocr.models.backbones.mindcv_models.repvgg.RepVGGBlock.get_custom_l2()

This may improve the accuracy and facilitates quantization in some cases.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def get_custom_l2(self):
    """This may improve the accuracy and facilitates quantization in some cases."""
    k3 = self.rbr_dense.conv.weight
    k1 = self.rbr_1x1.conv.weight

    t3 = self.rbr_dense.bn.weight / (
        ops.sqrt((self.rbr_dense.bn.moving_variance + self.rbr_dense.bn.eps)))
    t3 = ops.reshape(t3, (-1, 1, 1, 1))

    t1 = (self.rbr_1x1.bn.weight /
          ((self.rbr_1x1.bn.moving_variance + self.rbr_1x1.bn.eps).sqrt()))
    t1 = ops.reshape(t1, (-1, 1, 1, 1))

    l2_loss_circle = ops.reduce_sum(k3**2) - ops.reduce_sum(k3[:, :, 1:2, 1:2] ** 2)
    eq_kernel = k3[:, :, 1:2, 1:2] * t3 + k1 * t1
    l2_loss_eq_kernel = ops.reduce_sum(eq_kernel**2 / (t3**2 + t1**2))
    return l2_loss_eq_kernel + l2_loss_circle
mindocr.models.backbones.mindcv_models.repvgg.RepVGGBlock.switch_to_deploy()

Model_convert

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
def switch_to_deploy(self):
    """Model_convert"""
    if self.rbr_reparam is not None:
        return
    kernel, bias = self.get_equivalent_kernel_bias()
    self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels,
                                 out_channels=self.rbr_dense.conv.out_channels,
                                 kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
                                 padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation,
                                 group=self.rbr_dense.conv.group, has_bias=True, pad_mode="pad")
    self.rbr_reparam.weight.data = kernel
    self.rbr_reparam.bias.data = bias
    for para in self.parameters():
        para.detach_()
    self.__delattr__("rbr_dense")
    self.__delattr__("rbr_1x1")
    if hasattr(self, "rbr_identity"):
        self.__delattr__("rbr_identity")
    if hasattr(self, "id_tensor"):
        self.__delattr__("id_tensor")
    self.deploy = True
mindocr.models.backbones.mindcv_models.repvgg.repvgg_a0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[0.75, 0.75, 0.75, 2.5]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
281
282
283
284
285
286
287
288
289
290
291
292
@register_model
def repvgg_a0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[0.75, 0.75, 0.75, 2.5].
    Refer to the base class `models.RepVGG` for more details.
    """
    default_cfg = default_cfgs["repvgg_a0"]
    model = RepVGG(num_blocks=[2, 4, 14, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[0.75, 0.75, 0.75, 2.5], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_a1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
295
296
297
298
299
300
301
302
303
304
305
306
307
@register_model
def repvgg_a1(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5].
     Refer to the base class `models.RepVGG` for more details.
     """
    default_cfg = default_cfgs["repvgg_a1"]
    model = RepVGG(num_blocks=[2, 4, 14, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[1.0, 1.0, 1.0, 2.5], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_a2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.5, 1.5, 1.5, 2.75]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
310
311
312
313
314
315
316
317
318
319
320
321
322
@register_model
def repvgg_a2(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[2, 4, 14, 1], width_multiplier=[1.5, 1.5, 1.5, 2.75].
     Refer to the base class `models.RepVGG` for more details.
     """
    default_cfg = default_cfgs["repvgg_a2"]
    model = RepVGG(num_blocks=[2, 4, 14, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[1.5, 1.5, 1.5, 2.75], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
325
326
327
328
329
330
331
332
333
334
335
336
337
@register_model
def repvgg_b0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[1.0, 1.0, 1.0, 2.5].
     Refer to the base class `models.RepVGG` for more details.
     """
    default_cfg = default_cfgs['repvgg_b0']
    model = RepVGG(num_blocks=[4, 6, 16, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[1.0, 1.0, 1.0, 2.5], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.0, 2.0, 2.0, 4.0]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
340
341
342
343
344
345
346
347
348
349
350
351
352
@register_model
def repvgg_b1(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.0, 2.0, 2.0, 4.0].
     Refer to the base class `models.RepVGG` for more details.
     """
    default_cfg = default_cfgs['repvgg_b1']
    model = RepVGG(num_blocks=[4, 6, 16, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[2.0, 2.0, 2.0, 4.0], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5.0]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
355
356
357
358
359
360
361
362
363
364
365
366
367
@register_model
def repvgg_b2(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5.0].
     Refer to the base class `models.RepVGG` for more details.
     """
    default_cfg = default_cfgs['repvgg_b2']
    model = RepVGG(num_blocks=[4, 6, 16, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[2.5, 2.5, 2.5, 5.0], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_b3(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[3.0, 3.0, 3.0, 5.0]. Refer to the base class models.RepVGG for more details.

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
370
371
372
373
374
375
376
377
378
379
380
381
382
@register_model
def repvgg_b3(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> RepVGG:
    """Get RepVGG model with num_blocks=[4, 6, 16, 1], width_multiplier=[3.0, 3.0, 3.0, 5.0].
     Refer to the base class `models.RepVGG` for more details.
     """
    default_cfg = default_cfgs['repvgg_b3']
    model = RepVGG(num_blocks=[4, 6, 16, 1], num_classes=num_classes, in_channels=in_channels,
                   width_multiplier=[3.0, 3.0, 3.0, 5.0], override_group_map=None, deploy=False, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.repvgg.repvgg_model_convert(model, save_path=None, do_copy=True)

repvgg_model_convert

Source code in mindocr\models\backbones\mindcv_models\repvgg.py
385
386
387
388
389
390
391
392
393
394
def repvgg_model_convert(model: nn.Cell, save_path=None, do_copy=True):
    """repvgg_model_convert"""
    if do_copy:
        model = copy.deepcopy(model)
    for module in model.modules():
        if hasattr(module, "switch_to_deploy"):
            module.switch_to_deploy()
    if save_path is not None:
        save_checkpoint(model.parameters_and_names(), save_path)
    return model
mindocr.models.backbones.mindcv_models.res2net

MindSpore implementation of Res2Net. Refer to Res2Net: A New Multi-scale Backbone Architecture.

mindocr.models.backbones.mindcv_models.res2net.Res2Net

Bases: nn.Cell

Res2Net model class, based on "Res2Net: A New Multi-scale Backbone Architecture" <https://arxiv.org/abs/1904.01169>_

PARAMETER DESCRIPTION
block

block of resnet.

TYPE: Type[nn.Cell]

layer_nums

number of layers of each stage.

TYPE: List[int]

version

variety of Res2Net, 'res2net' or 'res2net_v1b'. Default: 'res2net'.

TYPE: str DEFAULT: 'res2net'

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

groups

number of groups for group conv in blocks. Default: 1.

TYPE: int DEFAULT: 1

base_width

base width of pre group hidden channel in blocks. Default: 26.

TYPE: int DEFAULT: 26

scale

scale factor of Bottle2neck. Default: 4.

DEFAULT: 4

norm

normalization layer in blocks. Default: None.

TYPE: Optional[nn.Cell] DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\res2net.py
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
class Res2Net(nn.Cell):
    r"""Res2Net model class, based on
    `"Res2Net: A New Multi-scale Backbone Architecture" <https://arxiv.org/abs/1904.01169>`_

    Args:
        block: block of resnet.
        layer_nums: number of layers of each stage.
        version: variety of Res2Net, 'res2net' or 'res2net_v1b'. Default: 'res2net'.
        num_classes: number of classification classes. Default: 1000.
        in_channels: number the channels of the input. Default: 3.
        groups: number of groups for group conv in blocks. Default: 1.
        base_width: base width of pre group hidden channel in blocks. Default: 26.
        scale: scale factor of Bottle2neck. Default: 4.
        norm: normalization layer in blocks. Default: None.
    """

    def __init__(
        self,
        block: Type[nn.Cell],
        layer_nums: List[int],
        version: str = "res2net",
        num_classes: int = 1000,
        in_channels: int = 3,
        groups: int = 1,
        base_width: int = 26,
        scale=4,
        norm: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        assert version in ["res2net", "res2net_v1b"]
        self.version = version

        if norm is None:
            norm = nn.BatchNorm2d
        self.norm = norm

        self.num_classes = num_classes
        self.input_channels = 64
        self.groups = groups
        self.base_width = base_width
        self.scale = scale
        if self.version == "res2net":
            self.conv1 = nn.Conv2d(in_channels, self.input_channels, kernel_size=7,
                                   stride=2, padding=3, pad_mode="pad")
        elif self.version == "res2net_v1b":
            self.conv1 = nn.SequentialCell([
                nn.Conv2d(in_channels, self.input_channels // 2, kernel_size=3,
                          stride=2, padding=1, pad_mode="pad"),
                norm(self.input_channels // 2),
                nn.ReLU(),
                nn.Conv2d(self.input_channels // 2, self.input_channels // 2, kernel_size=3,
                          stride=1, padding=1, pad_mode="pad"),
                norm(self.input_channels // 2),
                nn.ReLU(),
                nn.Conv2d(self.input_channels // 2, self.input_channels, kernel_size=3,
                          stride=1, padding=1, pad_mode="pad"),
            ])

        self.bn1 = norm(self.input_channels)
        self.relu = nn.ReLU()
        self.max_pool = nn.SequentialCell([
            nn.Pad(paddings=((0, 0), (0, 0), (1, 1), (1, 1)), mode="CONSTANT"),
            nn.MaxPool2d(kernel_size=3, stride=2)
        ])
        self.layer1 = self._make_layer(block, 64, layer_nums[0])
        self.layer2 = self._make_layer(block, 128, layer_nums[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layer_nums[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layer_nums[3], stride=2)

        self.pool = GlobalAvgPooling()
        self.num_features = 512 * block.expansion
        self.classifier = nn.Dense(self.num_features, num_classes)
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(math.sqrt(5), mode="fan_out", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                         cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def _make_layer(
        self,
        block: Type[nn.Cell],
        channels: int,
        block_nums: int,
        stride: int = 1,
    ) -> nn.SequentialCell:
        down_sample = None

        if stride != 1 or self.input_channels != channels * block.expansion:
            if stride == 1 or self.version == "res2net":
                down_sample = nn.SequentialCell([
                    nn.Conv2d(self.input_channels, channels * block.expansion, kernel_size=1, stride=stride),
                    self.norm(channels * block.expansion)
                ])
            else:
                down_sample = nn.SequentialCell([
                    nn.AvgPool2d(kernel_size=stride, stride=stride, pad_mode="same"),
                    nn.Conv2d(self.input_channels, channels * block.expansion, kernel_size=1, stride=1),
                    self.norm(channels * block.expansion)
                ])

        layers = []
        layers.append(
            block(
                self.input_channels,
                channels,
                stride=stride,
                down_sample=down_sample,
                groups=self.groups,
                base_width=self.base_width,
                scale=self.scale,
                stype="stage",
                norm=self.norm,
            )
        )
        self.input_channels = channels * block.expansion

        for _ in range(1, block_nums):
            layers.append(
                block(
                    self.input_channels,
                    channels,
                    groups=self.groups,
                    base_width=self.base_width,
                    scale=self.scale,
                    norm=self.norm,
                )
            )

        return nn.SequentialCell(layers)

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.max_pool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.res2net.res2net101(pretrained=False, num_classes=1001, in_channels=3, **kwargs)

Get 101 layers Res2Net model. Refer to the base class models.Res2Net for more details.

Source code in mindocr\models\backbones\mindcv_models\res2net.py
324
325
326
327
328
329
330
331
332
333
334
335
@register_model
def res2net101(pretrained: bool = False, num_classes: int = 1001, in_channels=3, **kwargs):
    """Get 101 layers Res2Net model.
    Refer to the base class `models.Res2Net` for more details.
    """
    default_cfg = default_cfgs["res2net101"]
    model = Res2Net(Bottle2neck, [3, 4, 23, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.res2net.res2net152(pretrained=False, num_classes=1001, in_channels=3, **kwargs)

Get 152 layers Res2Net model. Refer to the base class models.Res2Net for more details.

Source code in mindocr\models\backbones\mindcv_models\res2net.py
338
339
340
341
342
343
344
345
346
347
348
349
@register_model
def res2net152(pretrained: bool = False, num_classes: int = 1001, in_channels=3, **kwargs):
    """Get 152 layers Res2Net model.
    Refer to the base class `models.Res2Net` for more details.
    """
    default_cfg = default_cfgs["res2net152"]
    model = Res2Net(Bottle2neck, [3, 8, 36, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.res2net.res2net50(pretrained=False, num_classes=1001, in_channels=3, **kwargs)

Get 50 layers Res2Net model. Refer to the base class models.Res2Net for more details.

Source code in mindocr\models\backbones\mindcv_models\res2net.py
310
311
312
313
314
315
316
317
318
319
320
321
@register_model
def res2net50(pretrained: bool = False, num_classes: int = 1001, in_channels=3, **kwargs):
    """Get 50 layers Res2Net model.
    Refer to the base class `models.Res2Net` for more details.
    """
    default_cfg = default_cfgs["res2net50"]
    model = Res2Net(Bottle2neck, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnest

MindSpore implementation of ResNeSt. Refer to ResNeSt: Split-Attention Networks.

mindocr.models.backbones.mindcv_models.resnest.Bottleneck

Bases: nn.Cell

ResNeSt Bottleneck

Source code in mindocr\models\backbones\mindcv_models\resnest.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
class Bottleneck(nn.Cell):
    """ResNeSt Bottleneck"""

    expansion = 4

    def __init__(
        self,
        inplanes: int,
        planes: int,
        stride=1,
        downsample: Optional[nn.SequentialCell] = None,
        radix: int = 1,
        cardinality: int = 1,
        bottleneck_width: int = 64,
        avd: bool = False,
        avd_first: bool = False,
        dilation: int = 1,
        is_first: bool = False,
        norm_layer: Optional[nn.Cell] = None,
    ) -> None:
        super(Bottleneck, self).__init__()
        group_width = int(planes * (bottleneck_width / 64.0)) * cardinality
        self.conv1 = nn.Conv2d(inplanes, group_width, kernel_size=1, has_bias=False)
        self.bn1 = norm_layer(group_width)
        self.radix = radix
        self.avd = avd and (stride > 1 or is_first)
        self.avd_first = avd_first

        if self.avd:
            self.avd_layer = nn.AvgPool2d(3, stride, pad_mode="same")
            stride = 1

        if radix >= 1:
            self.conv2 = SplitAttn(group_width, group_width, kernel_size=3, stride=stride,
                                   padding=dilation, dilation=dilation, group=cardinality,
                                   bias=False, radix=radix, norm_layer=norm_layer)
        else:
            self.conv2 = nn.Conv2d(group_width, group_width, kernel_size=3, stride=stride,
                                   pad_mode="pad", padding=dilation, dilation=dilation,
                                   group=cardinality, has_bias=False)
            self.bn2 = norm_layer(group_width)

        self.conv3 = nn.Conv2d(group_width, planes * 4, kernel_size=1, has_bias=False)
        self.bn3 = norm_layer(planes * 4)

        self.relu = nn.ReLU()
        self.downsample = downsample
        self.dilation = dilation
        self.stride = stride

    def construct(self, x: Tensor) -> Tensor:
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        if self.avd and self.avd_first:
            out = self.avd_layer(out)

        out = self.conv2(out)
        if self.radix == 0:
            out = self.bn2(out)
            out = self.relu(out)

        if self.avd and not self.avd_first:
            out = self.avd_layer(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.resnest.ResNeSt

Bases: nn.Cell

ResNeSt model class, based on "ResNeSt: Split-Attention Networks" <https://arxiv.org/abs/2004.08955>_

PARAMETER DESCRIPTION
block

Class for the residual block. Option is Bottleneck.

TYPE: Type[Bottleneck]

layers

Numbers of layers in each block.

TYPE: List[int]

radix

Number of groups for Split-Attention conv. Default: 1.

TYPE: int DEFAULT: 1

group

Number of groups for the conv in each bottleneck block. Default: 1.

TYPE: int DEFAULT: 1

bottleneck_width

bottleneck channels factor. Default: 64.

TYPE: int DEFAULT: 64

num_classes

Number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

dilated

Applying dilation strategy to pretrained ResNeSt yielding a stride-8 model, typically used in Semantic Segmentation. Default: False.

TYPE: bool DEFAULT: False

dilation

Number of dilation in the conv. Default: 1.

TYPE: int DEFAULT: 1

deep_stem

three 3x3 convolution layers of widths stem_width, stem_width, stem_width * 2. Default: False.

TYPE: bool DEFAULT: False

stem_width

number of channels in stem convolutions. Default: 64.

TYPE: int DEFAULT: 64

avg_down

use avg pooling for projection skip connection between stages/downsample. Default: False.

TYPE: bool DEFAULT: False

avd

use avg pooling before or after split-attention conv. Default: False.

TYPE: bool DEFAULT: False

avd_first

use avg pooling before or after split-attention conv. Default: False.

TYPE: bool DEFAULT: False

drop_rate

Drop probability for the Dropout layer. Default: 0.

TYPE: float DEFAULT: 0.0

norm_layer

Normalization layer used in backbone network. Default: nn.BatchNorm2d.

TYPE: nn.Cell DEFAULT: nn.BatchNorm2d

Source code in mindocr\models\backbones\mindcv_models\resnest.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
class ResNeSt(nn.Cell):
    r"""ResNeSt model class, based on
    `"ResNeSt: Split-Attention Networks" <https://arxiv.org/abs/2004.08955>`_

    Args:
        block: Class for the residual block. Option is Bottleneck.
        layers: Numbers of layers in each block.
        radix: Number of groups for Split-Attention conv. Default: 1.
        group: Number of groups for the conv in each bottleneck block. Default: 1.
        bottleneck_width: bottleneck channels factor. Default: 64.
        num_classes: Number of classification classes. Default: 1000.
        dilated: Applying dilation strategy to pretrained ResNeSt yielding a stride-8 model,
                 typically used in Semantic Segmentation. Default: False.
        dilation: Number of dilation in the conv. Default: 1.
        deep_stem: three 3x3 convolution layers of widths stem_width, stem_width, stem_width * 2.
                   Default: False.
        stem_width: number of channels in stem convolutions. Default: 64.
        avg_down: use avg pooling for projection skip connection between stages/downsample.
                  Default: False.
        avd: use avg pooling before or after split-attention conv. Default: False.
        avd_first: use avg pooling before or after split-attention conv. Default: False.
        drop_rate: Drop probability for the Dropout layer. Default: 0.
        norm_layer: Normalization layer used in backbone network. Default: nn.BatchNorm2d.
    """

    def __init__(
        self,
        block: Type[Bottleneck],
        layers: List[int],
        radix: int = 1,
        group: int = 1,
        bottleneck_width: int = 64,
        num_classes: int = 1000,
        dilated: bool = False,
        dilation: int = 1,
        deep_stem: bool = False,
        stem_width: int = 64,
        avg_down: bool = False,
        avd: bool = False,
        avd_first: bool = False,
        drop_rate: float = 0.0,
        norm_layer: nn.Cell = nn.BatchNorm2d,
    ) -> None:
        super(ResNeSt, self).__init__()
        self.cardinality = group
        self.bottleneck_width = bottleneck_width
        # ResNet-D params
        self.inplanes = stem_width * 2 if deep_stem else 64
        self.avg_down = avg_down
        # ResNeSt params
        self.radix = radix
        self.avd = avd
        self.avd_first = avd_first

        if deep_stem:
            self.conv1 = nn.SequentialCell([
                nn.Conv2d(3, stem_width, kernel_size=3, stride=2, pad_mode="pad",
                          padding=1, has_bias=False),
                norm_layer(stem_width),
                nn.ReLU(),
                nn.Conv2d(stem_width, stem_width, kernel_size=3, stride=1, pad_mode="pad",
                          padding=1, has_bias=False),
                norm_layer(stem_width),
                nn.ReLU(),
                nn.Conv2d(stem_width, stem_width * 2, kernel_size=3, stride=1, pad_mode="pad",
                          padding=1, has_bias=False),
            ])
        else:
            self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, pad_mode="pad", padding=3,
                                   has_bias=False)

        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")

        self.layer1 = self._make_layer(block, 64, layers[0], norm_layer=norm_layer, is_first=False)
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, norm_layer=norm_layer)
        if dilated or dilation == 4:
            self.layer3 = self._make_layer(block, 256, layers[2], stride=1, dilation=2, norm_layer=norm_layer)
            self.layer4 = self._make_layer(block, 512, layers[3], stride=1, dilation=4, norm_layer=norm_layer)
        elif dilation == 2:
            self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilation=1, norm_layer=norm_layer)
            self.layer4 = self._make_layer(block, 512, layers[3], stride=1, dilation=2, norm_layer=norm_layer)
        else:
            self.layer3 = self._make_layer(block, 256, layers[2], stride=2, norm_layer=norm_layer)
            self.layer4 = self._make_layer(block, 512, layers[3], stride=2, norm_layer=norm_layer)
        self.avgpool = GlobalAvgPooling()
        self.drop = nn.Dropout(keep_prob=1.0 - drop_rate) if drop_rate > 0.0 else None
        self.fc = nn.Dense(512 * block.expansion, num_classes)

        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(
                        init.HeNormal(mode="fan_out", nonlinearity="relu"), cell.weight.shape, cell.weight.dtype
                    )
                )
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(
                        init.HeUniform(mode="fan_in", nonlinearity="sigmoid"), cell.weight.shape, cell.weight.dtype
                    )
                )
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def _make_layer(
        self,
        block: Type[Bottleneck],
        planes: int,
        blocks: int,
        stride: int = 1,
        dilation: int = 1,
        norm_layer: Optional[nn.Cell] = None,
        is_first: bool = True,
    ) -> nn.SequentialCell:
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            down_layers = []
            if self.avg_down:
                if dilation == 1:
                    down_layers.append(nn.AvgPool2d(kernel_size=stride, stride=stride, pad_mode="valid"))
                else:
                    down_layers.append(nn.AvgPool2d(kernel_size=1, stride=1, pad_mode="valid"))

                down_layers.append(nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1,
                                             stride=1, has_bias=False))
            else:
                down_layers.append(
                    nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride,
                              has_bias=False))
            down_layers.append(norm_layer(planes * block.expansion))
            downsample = nn.SequentialCell(down_layers)

        layers = []
        if dilation == 1 or dilation == 2:
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    stride,
                    downsample=downsample,
                    radix=self.radix,
                    cardinality=self.cardinality,
                    bottleneck_width=self.bottleneck_width,
                    avd=self.avd,
                    avd_first=self.avd_first,
                    dilation=1,
                    is_first=is_first,
                    norm_layer=norm_layer,
                )
            )
        elif dilation == 4:
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    stride,
                    downsample=downsample,
                    radix=self.radix,
                    cardinality=self.cardinality,
                    bottleneck_width=self.bottleneck_width,
                    avd=self.avd,
                    avd_first=self.avd_first,
                    dilation=2,
                    is_first=is_first,
                    norm_layer=norm_layer,
                )
            )
        else:
            raise ValueError(f"Unsupported model type {dilation}")

        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    radix=self.radix,
                    cardinality=self.cardinality,
                    bottleneck_width=self.bottleneck_width,
                    avd=self.avd,
                    avd_first=self.avd_first,
                    dilation=dilation,
                    norm_layer=norm_layer,
                )
            )
        return nn.SequentialCell(layers)

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.avgpool(x)
        if self.drop:
            x = self.drop(x)
        x = self.fc(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.resnest.SplitAttn

Bases: nn.Cell

Split-Attention Conv2d

Source code in mindocr\models\backbones\mindcv_models\resnest.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
class SplitAttn(nn.Cell):
    """Split-Attention Conv2d"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int = 3,
        stride: int = 1,
        padding: int = 0,
        dilation: int = 1,
        group: int = 1,
        bias: bool = False,
        radix: int = 2,
        rd_ratio: float = 0.25,
        rd_channels: Optional[int] = None,
        rd_divisor: int = 8,
        act_layer: nn.Cell = nn.ReLU,
        norm_layer: Optional[nn.Cell] = None,
    ) -> None:
        super(SplitAttn, self).__init__()
        out_channels = out_channels or in_channels
        self.radix = radix
        mid_chs = out_channels * radix

        if rd_channels is None:
            attn_chs = make_divisible(in_channels * radix * rd_ratio, min_value=32, divisor=rd_divisor)
        else:
            attn_chs = rd_channels * radix

        padding = kernel_size // 2 if padding is None else padding

        self.conv = nn.Conv2d(in_channels, mid_chs, kernel_size=kernel_size, stride=stride,
                              pad_mode="pad", padding=padding, dilation=dilation,
                              group=group * radix, has_bias=bias)
        self.bn0 = norm_layer(mid_chs) if norm_layer else Identity()
        self.act0 = act_layer()
        self.fc1 = nn.Conv2d(out_channels, attn_chs, 1, group=group, has_bias=True)
        self.bn1 = norm_layer(attn_chs) if norm_layer else nn.Identity()
        self.act1 = act_layer()
        self.fc2 = nn.Conv2d(attn_chs, mid_chs, 1, group=group, has_bias=True)
        self.rsoftmax = RadixSoftmax(radix, group)
        self.pool = GlobalAvgPooling(keep_dims=True)

    def construct(self, x: Tensor) -> Tensor:
        x = self.conv(x)
        x = self.bn0(x)
        x = self.act0(x)

        B, RC, H, W = x.shape
        if self.radix > 1:
            x = ops.reshape(x, (B, self.radix, RC // self.radix, H, W))
            x_gap = x.sum(axis=1)
        else:
            x_gap = x
        x_gap = self.pool(x_gap)
        x_gap = self.fc1(x_gap)
        x_gap = self.bn1(x_gap)
        x_gap = self.act1(x_gap)
        x_attn = self.fc2(x_gap)

        x_attn = self.rsoftmax(x_attn)
        x_attn = ops.reshape(x_attn, (B, -1, 1, 1))
        if self.radix > 1:
            out = x * ops.reshape(x_attn, (B, self.radix, RC // self.radix, 1, 1))
            out = out.sum(axis=1)
        else:
            out = x * x_attn

        return out
mindocr.models.backbones.mindcv_models.resnet

MindSpore implementation of ResNet. Refer to Deep Residual Learning for Image Recognition.

mindocr.models.backbones.mindcv_models.resnet.BasicBlock

Bases: nn.Cell

define the basic block of resnet

Source code in mindocr\models\backbones\mindcv_models\resnet.py
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
class BasicBlock(nn.Cell):
    """define the basic block of resnet"""
    expansion: int = 1

    def __init__(
        self,
        in_channels: int,
        channels: int,
        stride: int = 1,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        down_sample: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d
        assert groups == 1, "BasicBlock only supports groups=1"
        assert base_width == 64, "BasicBlock only supports base_width=64"

        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=3,
                               stride=stride, padding=1, pad_mode="pad")
        self.bn1 = norm(channels)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3,
                               stride=1, padding=1, pad_mode="pad")
        self.bn2 = norm(channels)
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)

        out += identity
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.resnet.Bottleneck

Bases: nn.Cell

Bottleneck here places the stride for downsampling at 3x3 convolution(self.conv2) as torchvision does, while original implementation places the stride at the first 1x1 convolution(self.conv1)

Source code in mindocr\models\backbones\mindcv_models\resnet.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
class Bottleneck(nn.Cell):
    """
    Bottleneck here places the stride for downsampling at 3x3 convolution(self.conv2) as torchvision does,
    while original implementation places the stride at the first 1x1 convolution(self.conv1)
    """
    expansion: int = 4

    def __init__(
        self,
        in_channels: int,
        channels: int,
        stride: int = 1,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        down_sample: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d

        width = int(channels * (base_width / 64.0)) * groups

        self.conv1 = nn.Conv2d(in_channels, width, kernel_size=1, stride=1)
        self.bn1 = norm(width)
        self.conv2 = nn.Conv2d(width, width, kernel_size=3, stride=stride,
                               padding=1, pad_mode="pad", group=groups)
        self.bn2 = norm(width)
        self.conv3 = nn.Conv2d(width, channels * self.expansion,
                               kernel_size=1, stride=1)
        self.bn3 = norm(channels * self.expansion)
        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)

        out += identity
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.resnet.ResNet

Bases: nn.Cell

ResNet model class, based on "Deep Residual Learning for Image Recognition" <https://arxiv.org/abs/1512.03385>_

PARAMETER DESCRIPTION
block

block of resnet.

TYPE: Type[Union[BasicBlock, Bottleneck]]

layers

number of layers of each stage.

TYPE: List[int]

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

groups

number of groups for group conv in blocks. Default: 1.

TYPE: int DEFAULT: 1

base_width

base width of pre group hidden channel in blocks. Default: 64.

TYPE: int DEFAULT: 64

norm

normalization layer in blocks. Default: None.

TYPE: Optional[nn.Cell] DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\resnet.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
class ResNet(nn.Cell):
    r"""ResNet model class, based on
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/abs/1512.03385>`_

    Args:
        block: block of resnet.
        layers: number of layers of each stage.
        num_classes: number of classification classes. Default: 1000.
        in_channels: number the channels of the input. Default: 3.
        groups: number of groups for group conv in blocks. Default: 1.
        base_width: base width of pre group hidden channel in blocks. Default: 64.
        norm: normalization layer in blocks. Default: None.
    """

    def __init__(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        layers: List[int],
        num_classes: int = 1000,
        in_channels: int = 3,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d

        self.norm: nn.Cell = norm  # add type hints to make pylint happy
        self.input_channels = 64
        self.groups = groups
        self.base_with = base_width

        self.conv1 = nn.Conv2d(in_channels, self.input_channels, kernel_size=7,
                               stride=2, pad_mode="pad", padding=3)
        self.bn1 = norm(self.input_channels)
        self.relu = nn.ReLU()
        self.feature_info = [dict(chs=self.input_channels, reduction=2, name="relu")]

        self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
        self.layer1 = self._make_layer(block, 64, layers[0], name="layer1", reduction=4)
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, name="layer2", reduction=8)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2, name="layer3", reduction=16)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2, name="layer4", reduction=32)

        self.pool = GlobalAvgPooling()
        self.num_features = 512 * block.expansion
        self.classifier = nn.Dense(self.num_features, num_classes)

        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(mode='fan_out', nonlinearity='relu'),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer('zeros', cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer('ones', cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer('zeros', cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(mode='fan_in', nonlinearity='sigmoid'),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer('zeros', cell.bias.shape, cell.bias.dtype))

    def _make_layer(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        channels: int,
        block_nums: int,
        stride: int = 1,
        name: str = "",
        reduction: int = 1,
    ) -> nn.SequentialCell:
        """build model depending on cfgs"""
        down_sample = None

        if stride != 1 or self.input_channels != channels * block.expansion:
            down_sample = nn.SequentialCell([
                nn.Conv2d(self.input_channels, channels * block.expansion, kernel_size=1, stride=stride),
                self.norm(channels * block.expansion)
            ])

        layers = []
        layers.append(
            block(
                self.input_channels,
                channels,
                stride=stride,
                down_sample=down_sample,
                groups=self.groups,
                base_width=self.base_with,
                norm=self.norm,
            )
        )
        self.input_channels = channels * block.expansion

        for _ in range(1, block_nums):
            layers.append(
                block(
                    self.input_channels,
                    channels,
                    groups=self.groups,
                    base_width=self.base_with,
                    norm=self.norm
                )
            )

        self.feature_info.append(dict(chs=self.input_channels, reduction=reduction, name=name))

        return nn.SequentialCell(layers)

    def forward_features(self, x: Tensor) -> Tensor:
        """Network forward feature extraction."""
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.max_pool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.resnet.ResNet.forward_features(x)

Network forward feature extraction.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
281
282
283
284
285
286
287
288
289
290
291
292
def forward_features(self, x: Tensor) -> Tensor:
    """Network forward feature extraction."""
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.max_pool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    return x
mindocr.models.backbones.mindcv_models.resnet.resnet101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 101 layers ResNet model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
347
348
349
350
351
352
353
354
355
356
357
358
@register_model
def resnet101(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 101 layers ResNet model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnet101"]
    model = ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnet152(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 152 layers ResNet model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
361
362
363
364
365
366
367
368
369
370
371
372
@register_model
def resnet152(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 152 layers ResNet model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnet152"]
    model = ResNet(Bottleneck, [3, 8, 36, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnet18(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 18 layers ResNet model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
305
306
307
308
309
310
311
312
313
314
315
316
@register_model
def resnet18(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 18 layers ResNet model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnet18"]
    model = ResNet(BasicBlock, [2, 2, 2, 2], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnet34(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 34 layers ResNet model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
319
320
321
322
323
324
325
326
327
328
329
330
@register_model
def resnet34(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 34 layers ResNet model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnet34"]
    model = ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers ResNet model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
333
334
335
336
337
338
339
340
341
342
343
344
@register_model
def resnet50(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 50 layers ResNet model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnet50"]
    model = ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnext101_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 101 layers ResNeXt model with 32 groups of GPConv. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
390
391
392
393
394
395
396
397
398
399
400
401
402
@register_model
def resnext101_32x4d(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 101 layers ResNeXt model with 32 groups of GPConv.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnext101_32x4d"]
    model = ResNet(Bottleneck, [3, 4, 23, 3], groups=32, base_width=4, num_classes=num_classes,
                   in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnext101_64x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 101 layers ResNeXt model with 64 groups of GPConv. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
405
406
407
408
409
410
411
412
413
414
415
416
417
@register_model
def resnext101_64x4d(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 101 layers ResNeXt model with 64 groups of GPConv.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnext101_64x4d"]
    model = ResNet(Bottleneck, [3, 4, 23, 3], groups=64, base_width=4, num_classes=num_classes,
                   in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnet.resnext50_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers ResNeXt model with 32 groups of GPConv. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnet.py
375
376
377
378
379
380
381
382
383
384
385
386
387
@register_model
def resnext50_32x4d(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 50 layers ResNeXt model with 32 groups of GPConv.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnext50_32x4d"]
    model = ResNet(Bottleneck, [3, 4, 6, 3], groups=32, base_width=4, num_classes=num_classes,
                   in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnetv2

MindSpore implementation of ResNetV2. Refer to Identity Mappings in Deep Residual Networks.

mindocr.models.backbones.mindcv_models.resnetv2.resnetv2_101(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 101 layers ResNetV2 model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnetv2.py
108
109
110
111
112
113
114
115
116
117
118
119
@register_model
def resnetv2_101(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 101 layers ResNetV2 model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs["resnetv2_101"]
    model = ResNet(PreActBottleneck, [3, 4, 23, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.resnetv2.resnetv2_50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers ResNetV2 model. Refer to the base class models.ResNet for more details.

Source code in mindocr\models\backbones\mindcv_models\resnetv2.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
@register_model
def resnetv2_50(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs):
    """Get 50 layers ResNetV2 model.
    Refer to the base class `models.ResNet` for more details.
    """
    default_cfg = default_cfgs['resnetv2_50']
    model = ResNet(PreActBottleneck, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.rexnet

MindSpore implementation of ReXNet. Refer to ReXNet: Rethinking Channel Dimensions for Efficient Model Design.

mindocr.models.backbones.mindcv_models.rexnet.LinearBottleneck

Bases: nn.Cell

LinearBottleneck

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
class LinearBottleneck(nn.Cell):
    """LinearBottleneck"""

    def __init__(
        self,
        in_channels,
        out_channels,
        exp_ratio,
        stride,
        use_se=True,
        se_ratio=1 / 12,
        ch_div=1,
        act_layer=nn.SiLU,
        dw_act_layer=nn.ReLU6,
        drop_path=None,
        **kwargs,
    ):
        super(LinearBottleneck, self).__init__(**kwargs)
        self.use_shortcut = stride == 1 and in_channels <= out_channels
        self.in_channels = in_channels
        self.out_channels = out_channels

        if exp_ratio != 1:
            dw_channels = in_channels * exp_ratio
            self.conv_exp = Conv2dNormActivation(in_channels, dw_channels, 1, activation=act_layer)
        else:
            dw_channels = in_channels
            self.conv_exp = None

        self.conv_dw = Conv2dNormActivation(dw_channels, dw_channels, 3, stride, padding=1,
                                            groups=dw_channels, activation=None)

        if use_se:
            self.se = SqueezeExcite(dw_channels,
                                    rd_channels=make_divisible(int(dw_channels * se_ratio), ch_div),
                                    norm=nn.BatchNorm2d)
        else:
            self.se = None
        self.act_dw = dw_act_layer()

        self.conv_pwl = Conv2dNormActivation(dw_channels, out_channels, 1, padding=0, activation=None)
        self.drop_path = drop_path

    def construct(self, x):
        shortcut = x
        if self.conv_exp is not None:
            x = self.conv_exp(x)
        x = self.conv_dw(x)
        if self.se is not None:
            x = self.se(x)
        x = self.act_dw(x)
        x = self.conv_pwl(x)
        if self.use_shortcut:
            if self.drop_path is not None:
                x = self.drop_path(x)
            x[:, 0:self.in_channels] += shortcut
        return x
mindocr.models.backbones.mindcv_models.rexnet.ReXNetV1

Bases: nn.Cell

ReXNet model class, based on "Rethinking Channel Dimensions for Efficient Model Design" <https://arxiv.org/abs/2007.00992>_

PARAMETER DESCRIPTION
in_channels

number of the input channels. Default: 3.

TYPE: int DEFAULT: 3

fi_channels

number of the final channels. Default: 180.

TYPE: int DEFAULT: 180

initial_channels

initialize inplanes. Default: 16.

TYPE: int DEFAULT: 16

width_mult

The ratio of the channel. Default: 1.0.

TYPE: float DEFAULT: 1.0

depth_mult

The ratio of num_layers. Default: 1.0.

TYPE: float DEFAULT: 1.0

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

use_se

use SENet in LinearBottleneck. Default: True.

TYPE: bool DEFAULT: True

se_ratio

(float): SENet reduction ratio. Default 1/12.

DEFAULT: 1 / 12

drop_rate

dropout ratio. Default: 0.2.

TYPE: float DEFAULT: 0.2

ch_div

divisible by ch_div. Default: 1.

TYPE: int DEFAULT: 1

act_layer

activation function in ConvNormAct. Default: nn.SiLU.

TYPE: nn.Cell DEFAULT: nn.SiLU

dw_act_layer

activation function after dw_conv. Default: nn.ReLU6.

TYPE: nn.Cell DEFAULT: nn.ReLU6

cls_useconv

use conv in classification. Default: False.

TYPE: bool DEFAULT: False

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
class ReXNetV1(nn.Cell):
    r"""ReXNet model class, based on
    `"Rethinking Channel Dimensions for Efficient Model Design" <https://arxiv.org/abs/2007.00992>`_

    Args:
        in_channels (int): number of the input channels. Default: 3.
        fi_channels (int): number of the final channels. Default: 180.
        initial_channels (int): initialize inplanes. Default: 16.
        width_mult (float): The ratio of the channel. Default: 1.0.
        depth_mult (float): The ratio of num_layers. Default: 1.0.
        num_classes (int) : number of classification classes. Default: 1000.
        use_se (bool): use SENet in LinearBottleneck. Default: True.
        se_ratio: (float): SENet reduction ratio. Default 1/12.
        drop_rate (float): dropout ratio. Default: 0.2.
        ch_div (int): divisible by ch_div. Default: 1.
        act_layer (nn.Cell): activation function in ConvNormAct. Default: nn.SiLU.
        dw_act_layer (nn.Cell): activation function after dw_conv. Default: nn.ReLU6.
        cls_useconv (bool): use conv in classification. Default: False.
    """

    def __init__(
        self,
        in_channels=3,
        fi_channels=180,
        initial_channels=16,
        width_mult=1.0,
        depth_mult=1.0,
        num_classes=1000,
        use_se=True,
        se_ratio=1 / 12,
        drop_rate=0.2,
        drop_path_rate=0.0,
        ch_div=1,
        act_layer=nn.SiLU,
        dw_act_layer=nn.ReLU6,
        cls_useconv=False,
    ):
        super(ReXNetV1, self).__init__()

        layers = [1, 2, 2, 3, 3, 5]
        strides = [1, 2, 2, 2, 1, 2]
        use_ses = [False, False, True, True, True, True]

        layers = [ceil(element * depth_mult) for element in layers]
        strides = sum([[element] + [1] * (layers[idx] - 1)
                       for idx, element in enumerate(strides)], [])
        if use_se:
            use_ses = sum([[element] * layers[idx] for idx, element in enumerate(use_ses)], [])
        else:
            use_ses = [False] * sum(layers[:])
        exp_ratios = [1] * layers[0] + [6] * sum(layers[1:])

        self.depth = sum(layers[:]) * 3
        stem_channel = 32 / width_mult if width_mult < 1.0 else 32
        inplanes = initial_channels / width_mult if width_mult < 1.0 else initial_channels

        features = []
        in_channels_group = []
        out_channels_group = []

        for i in range(self.depth // 3):
            if i == 0:
                in_channels_group.append(int(round(stem_channel * width_mult)))
                out_channels_group.append(int(round(inplanes * width_mult)))
            else:
                in_channels_group.append(int(round(inplanes * width_mult)))
                inplanes += fi_channels / (self.depth // 3 * 1.0)
                out_channels_group.append(int(round(inplanes * width_mult)))

        stem_chs = make_divisible(round(stem_channel * width_mult), divisor=ch_div)
        self.stem = Conv2dNormActivation(in_channels, stem_chs, stride=2, padding=1, activation=act_layer)

        feat_chs = [stem_chs]
        feature_info = []
        curr_stride = 2
        features = []
        num_blocks = len(in_channels_group)
        for block_idx, (in_c, out_c, exp_ratio, stride, use_se) in enumerate(
            zip(in_channels_group, out_channels_group, exp_ratios, strides, use_ses)
        ):
            if stride > 1:
                fname = "stem" if block_idx == 0 else f"features.{block_idx - 1}"
                feature_info += [dict(num_chs=feat_chs[-1], reduction=curr_stride, module=fname)]
                curr_stride *= stride
            block_dpr = drop_path_rate * block_idx / (num_blocks - 1)  # stochastic depth linear decay rule
            drop_path = DropPath(block_dpr) if block_dpr > 0. else None
            features.append(LinearBottleneck(in_channels=in_c,
                                             out_channels=out_c,
                                             exp_ratio=exp_ratio,
                                             stride=stride,
                                             use_se=use_se,
                                             se_ratio=se_ratio,
                                             act_layer=act_layer,
                                             dw_act_layer=dw_act_layer,
                                             drop_path=drop_path))

        pen_channels = make_divisible(int(1280 * width_mult), divisor=ch_div)
        features.append(Conv2dNormActivation(out_channels_group[-1],
                                             pen_channels,
                                             kernel_size=1,
                                             activation=act_layer))

        features.append(GlobalAvgPooling(keep_dims=True))
        self.useconv = cls_useconv
        self.features = nn.SequentialCell(*features)
        if self.useconv:
            self.cls = nn.SequentialCell(
                nn.Dropout(1.0 - drop_rate),
                nn.Conv2d(pen_channels, num_classes, 1, has_bias=True))
        else:
            self.cls = nn.SequentialCell(
                nn.Dropout(1.0 - drop_rate),
                nn.Dense(pen_channels, num_classes))
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, (nn.Conv2d, nn.Dense)):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer(init.HeUniform(math.sqrt(5), mode="fan_in", nonlinearity="leaky_relu"),
                                         [1, cell.bias.shape[0]], cell.bias.dtype).reshape((-1)))

    def forward_features(self, x):
        x = self.stem(x)
        x = self.features(x)
        return x

    def forward_head(self, x):
        if not self.useconv:
            x = x.reshape((x.shape[0], -1))
            x = self.cls(x)
        else:
            x = self.cls(x).reshape((x.shape[0], -1))
        return x

    def construct(self, x):
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x09(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ReXNet model with width multiplier of 0.9. Refer to the base class models.ReXNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
269
270
271
272
273
274
@register_model
def rexnet_x09(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ReXNetV1:
    """Get ReXNet model with width multiplier of 0.9.
    Refer to the base class `models.ReXNetV1` for more details.
    """
    return _rexnet("rexnet_x09", 0.9, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x10(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ReXNet model with width multiplier of 1.0. Refer to the base class models.ReXNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
277
278
279
280
281
282
@register_model
def rexnet_x10(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ReXNetV1:
    """Get ReXNet model with width multiplier of 1.0.
    Refer to the base class `models.ReXNetV1` for more details.
    """
    return _rexnet("rexnet_x10", 1.0, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x13(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ReXNet model with width multiplier of 1.3. Refer to the base class models.ReXNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
285
286
287
288
289
290
@register_model
def rexnet_x13(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ReXNetV1:
    """Get ReXNet model with width multiplier of 1.3.
    Refer to the base class `models.ReXNetV1` for more details.
    """
    return _rexnet("rexnet_x13", 1.3, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x15(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ReXNet model with width multiplier of 1.5. Refer to the base class models.ReXNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
293
294
295
296
297
298
@register_model
def rexnet_x15(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ReXNetV1:
    """Get ReXNet model with width multiplier of 1.5.
    Refer to the base class `models.ReXNetV1` for more details.
    """
    return _rexnet("rexnet_x15", 1.5, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.rexnet.rexnet_x20(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ReXNet model with width multiplier of 2.0. Refer to the base class models.ReXNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\rexnet.py
301
302
303
304
305
306
@register_model
def rexnet_x20(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ReXNetV1:
    """Get ReXNet model with width multiplier of 2.0.
    Refer to the base class `models.ReXNetV1` for more details.
    """
    return _rexnet("rexnet_x20", 2.0, in_channels, num_classes, pretrained, **kwargs)
mindocr.models.backbones.mindcv_models.senet

MindSpore implementation of SENet. Refer to Squeeze-and-Excitation Networks.

mindocr.models.backbones.mindcv_models.senet.Bottleneck

Bases: nn.Cell

Define the base block class for [SEnet, SEResNet, SEResNext] bottlenecks that implements construct method.

Source code in mindocr\models\backbones\mindcv_models\senet.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class Bottleneck(nn.Cell):
    """
    Define the base block class for [SEnet, SEResNet, SEResNext] bottlenecks
    that implements `construct` method.
    """

    def construct(self, x: Tensor) -> Tensor:
        shortcut = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            shortcut = self.downsample(x)

        out = self.se_module(out) + shortcut
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.senet.SEBottleneck

Bases: Bottleneck

Define the Bottleneck for SENet154.

Source code in mindocr\models\backbones\mindcv_models\senet.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
class SEBottleneck(Bottleneck):
    """
    Define the Bottleneck for SENet154.
    """

    expansion: int = 4

    def __init__(
        self,
        in_channels: int,
        channels: int,
        group: int,
        reduction: int,
        stride: int = 1,
        downsample: Optional[nn.SequentialCell] = None,
    ) -> None:
        super(SEBottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, channels * 2, kernel_size=1, pad_mode="pad",
                               padding=0, has_bias=False)
        self.bn1 = nn.BatchNorm2d(channels * 2)
        self.conv2 = nn.Conv2d(channels * 2, channels * 4, kernel_size=3, stride=stride,
                               pad_mode="pad", padding=1, group=group, has_bias=False)
        self.bn2 = nn.BatchNorm2d(channels * 4)
        self.conv3 = nn.Conv2d(channels * 4, channels * 4, kernel_size=1, pad_mode="pad",
                               padding=0, has_bias=False)
        self.bn3 = nn.BatchNorm2d(channels * 4)
        self.relu = nn.ReLU()
        self.se_module = SqueezeExciteV2(channels * 4, rd_ratio=1.0 / reduction)
        self.downsample = downsample
        self.stride = stride
mindocr.models.backbones.mindcv_models.senet.SENet

Bases: nn.Cell

SENet model class, based on "Squeeze-and-Excitation Networks" <https://arxiv.org/abs/1709.01507>_

PARAMETER DESCRIPTION
block

block class of SENet.

TYPE: Type[Union[SEBottleneck, SEResNetBottleneck, SEResNetBlock, SEResNeXtBottleneck]]

layers

Number of residual blocks for 4 layers.

TYPE: List[int]

group

Number of groups for the conv in each bottleneck block.

TYPE: int

reduction

Reduction ratio for Squeeze-and-Excitation modules.

TYPE: int

drop_rate

Drop probability for the Dropout layer. Default: 0.

TYPE: float DEFAULT: 0.0

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

inplanes

Number of input channels for layer1. Default: 64.

TYPE: int DEFAULT: 64

input3x3

If True, use three 3x3 convolutions in layer0. Default: False.

TYPE: bool DEFAULT: False

downsample_kernel_size

Kernel size for downsampling convolutions. Default: 1.

TYPE: int DEFAULT: 1

downsample_padding

Padding for downsampling convolutions. Default: 0.

TYPE: int DEFAULT: 0

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

Source code in mindocr\models\backbones\mindcv_models\senet.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
class SENet(nn.Cell):
    r"""SENet model class, based on
    `"Squeeze-and-Excitation Networks" <https://arxiv.org/abs/1709.01507>`_

    Args:
        block: block class of SENet.
        layers: Number of residual blocks for 4 layers.
        group: Number of groups for the conv in each bottleneck block.
        reduction: Reduction ratio for Squeeze-and-Excitation modules.
        drop_rate: Drop probability for the Dropout layer. Default: 0.
        in_channels: number the channels of the input. Default: 3.
        inplanes:  Number of input channels for layer1. Default: 64.
        input3x3: If `True`, use three 3x3 convolutions in layer0. Default: False.
        downsample_kernel_size: Kernel size for downsampling convolutions. Default: 1.
        downsample_padding: Padding for downsampling convolutions. Default: 0.
        num_classes (int): number of classification classes. Default: 1000.
    """

    def __init__(
        self,
        block: Type[Union[SEBottleneck, SEResNetBottleneck, SEResNetBlock, SEResNeXtBottleneck]],
        layers: List[int],
        group: int,
        reduction: int,
        drop_rate: float = 0.0,
        in_channels: int = 3,
        inplanes: int = 64,
        input3x3: bool = False,
        downsample_kernel_size: int = 1,
        downsample_padding: int = 0,
        num_classes: int = 1000,
    ) -> None:
        super(SENet, self).__init__()
        self.inplanes = inplanes
        self.num_classes = num_classes
        self.drop_rate = drop_rate
        if input3x3:
            self.layer0 = nn.SequentialCell([
                nn.Conv2d(in_channels, 64, 3, stride=2, pad_mode="pad", padding=1, has_bias=False),
                nn.BatchNorm2d(64),
                nn.ReLU(),
                nn.Conv2d(64, 64, 3, stride=1, pad_mode="pad", padding=1, has_bias=False),
                nn.BatchNorm2d(64),
                nn.ReLU(),
                nn.Conv2d(64, inplanes, 3, stride=1, pad_mode="pad", padding=1, has_bias=False),
                nn.BatchNorm2d(inplanes),
                nn.ReLU()
            ])
        else:
            self.layer0 = nn.SequentialCell([
                nn.Conv2d(in_channels, inplanes, kernel_size=7, stride=2, pad_mode="pad",
                          padding=3, has_bias=False),
                nn.BatchNorm2d(inplanes),
                nn.ReLU()
            ])
        self.pool0 = nn.MaxPool2d(3, stride=2, pad_mode="same")

        self.layer1 = self._make_layer(block, planes=64, blocks=layers[0], group=group,
                                       reduction=reduction, downsample_kernel_size=1,
                                       downsample_padding=0)

        self.layer2 = self._make_layer(block, planes=128, blocks=layers[1], stride=2,
                                       group=group, reduction=reduction,
                                       downsample_kernel_size=downsample_kernel_size,
                                       downsample_padding=downsample_padding)

        self.layer3 = self._make_layer(block, planes=256, blocks=layers[2], stride=2,
                                       group=group, reduction=reduction,
                                       downsample_kernel_size=downsample_kernel_size,
                                       downsample_padding=downsample_padding)

        self.layer4 = self._make_layer(block, planes=512, blocks=layers[3], stride=2,
                                       group=group, reduction=reduction,
                                       downsample_kernel_size=downsample_kernel_size,
                                       downsample_padding=downsample_padding)

        self.num_features = 512 * block.expansion

        self.pool = GlobalAvgPooling()
        if self.drop_rate > 0.:
            self.dropout = nn.Dropout(keep_prob=1. - self.drop_rate)
        self.classifier = nn.Dense(self.num_features, self.num_classes)

        self._initialize_weights()

    def _make_layer(
        self,
        block: Type[Union[SEBottleneck, SEResNetBottleneck, SEResNetBlock, SEResNeXtBottleneck]],
        planes: int,
        blocks: int,
        group: int,
        reduction: int,
        stride: int = 1,
        downsample_kernel_size: int = 1,
        downsample_padding: int = 0,
    ) -> nn.SequentialCell:
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.SequentialCell([
                nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=downsample_kernel_size,
                          stride=stride, pad_mode="pad", padding=downsample_padding, has_bias=False),
                nn.BatchNorm2d(planes * block.expansion)
            ])

        layers = [block(self.inplanes, planes, group, reduction, stride, downsample)]
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes, group, reduction))

        return nn.SequentialCell(layers)

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(mode="fan_out", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer("ones", cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer("zeros", cell.beta.shape, cell.beta.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.HeUniform(mode="fan_in", nonlinearity="sigmoid"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.layer0(x)
        x = self.pool0(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        if self.drop_rate > 0.0:
            x = self.dropout(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.senet.SEResNeXtBottleneck

Bases: Bottleneck

Define the ResNeXt bottleneck type C with a Squeeze-and-Excitation module.

Source code in mindocr\models\backbones\mindcv_models\senet.py
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
class SEResNeXtBottleneck(Bottleneck):
    """
    Define the ResNeXt bottleneck type C with a Squeeze-and-Excitation module.
    """

    expansion: int = 4

    def __init__(
        self,
        in_channels: int,
        channels: int,
        group: int,
        reduction: int,
        stride: int = 1,
        downsample: Optional[nn.SequentialCell] = None,
        base_width: int = 4,
    ) -> None:
        super(SEResNeXtBottleneck, self).__init__()
        width = math.floor(channels * (base_width / 64)) * group
        self.conv1 = nn.Conv2d(in_channels, width, kernel_size=1, stride=1, pad_mode="pad",
                               padding=0, has_bias=False)
        self.bn1 = nn.BatchNorm2d(width)
        self.conv2 = nn.Conv2d(width, width, kernel_size=3, stride=stride, pad_mode="pad",
                               padding=1, group=group, has_bias=False)
        self.bn2 = nn.BatchNorm2d(width)
        self.conv3 = nn.Conv2d(width, channels * 4, kernel_size=1, pad_mode="pad", padding=0,
                               has_bias=False)
        self.bn3 = nn.BatchNorm2d(channels * 4)
        self.relu = nn.ReLU()
        self.se_module = SqueezeExciteV2(channels * 4, rd_ratio=1.0 / reduction)
        self.downsample = downsample
        self.stride = stride
mindocr.models.backbones.mindcv_models.senet.SEResNetBlock

Bases: nn.Cell

Define the basic block of resnet with a Squeeze-and-Excitation module.

Source code in mindocr\models\backbones\mindcv_models\senet.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
class SEResNetBlock(nn.Cell):
    """
    Define the basic block of resnet with a Squeeze-and-Excitation module.
    """

    expansion = 1

    def __init__(
        self,
        in_channels: int,
        channels: int,
        group: int,
        reduction: int,
        stride: int = 1,
        downsample: Optional[nn.SequentialCell] = None,
    ) -> None:
        super(SEResNetBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=3, stride=stride, pad_mode="pad",
                               padding=1, has_bias=False)
        self.bn1 = nn.BatchNorm2d(channels)
        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, pad_mode="pad", padding=1,
                               group=group, has_bias=False)
        self.bn2 = nn.BatchNorm2d(channels)
        self.relu = nn.ReLU()
        self.se_module = SqueezeExciteV2(channels, rd_ratio=1.0 / reduction)
        self.downsample = downsample
        self.stride = stride

    def construct(self, x: Tensor) -> Tensor:
        shortcut = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            shortcut = self.downsample(x)

        out = self.se_module(out) + shortcut
        out = self.relu(out)

        return out
mindocr.models.backbones.mindcv_models.senet.SEResNetBottleneck

Bases: Bottleneck

Define the ResNet bottleneck with a Squeeze-and-Excitation module, and the latter is used in the torchvision implementation of ResNet.

Source code in mindocr\models\backbones\mindcv_models\senet.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
class SEResNetBottleneck(Bottleneck):
    """
    Define the ResNet bottleneck with a Squeeze-and-Excitation module,
    and the latter is used in the torchvision implementation of ResNet.
    """

    expansion: int = 4

    def __init__(
        self,
        in_channels: int,
        channels: int,
        group: int,
        reduction: int,
        stride: int = 1,
        downsample: Optional[nn.SequentialCell] = None,
    ) -> None:
        super(SEResNetBottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=1, pad_mode="pad",
                               padding=0, has_bias=False)
        self.bn1 = nn.BatchNorm2d(channels)
        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, stride=stride, pad_mode="pad",
                               padding=1, group=group, has_bias=False)
        self.bn2 = nn.BatchNorm2d(channels)
        self.conv3 = nn.Conv2d(channels, channels * 4, kernel_size=1, pad_mode="pad", padding=0,
                               has_bias=False)
        self.bn3 = nn.BatchNorm2d(channels * 4)
        self.relu = nn.ReLU()
        self.se_module = SqueezeExciteV2(channels * 4, rd_ratio=1.0 / reduction)
        self.downsample = downsample
        self.stride = stride
mindocr.models.backbones.mindcv_models.shufflenetv1

MindSpore implementation of ShuffleNetV1. Refer to ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

mindocr.models.backbones.mindcv_models.shufflenetv1.ShuffleNetV1

Bases: nn.Cell

ShuffleNetV1 model class, based on "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices" <https://arxiv.org/abs/1707.01083>_ # noqa: E501

PARAMETER DESCRIPTION
num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number of input channels. Default: 3.

TYPE: int DEFAULT: 3

model_size

scale factor which controls the number of channels. Default: '2.0x'.

TYPE: str DEFAULT: '2.0x'

group

number of group for group convolution. Default: 3.

TYPE: int DEFAULT: 3

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
class ShuffleNetV1(nn.Cell):
    r"""ShuffleNetV1 model class, based on
    `"ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices" <https://arxiv.org/abs/1707.01083>`_  # noqa: E501

    Args:
        num_classes: number of classification classes. Default: 1000.
        in_channels: number of input channels. Default: 3.
        model_size: scale factor which controls the number of channels. Default: '2.0x'.
        group: number of group for group convolution. Default: 3.
    """

    def __init__(
        self,
        num_classes: int = 1000,
        in_channels: int = 3,
        model_size: str = "2.0x",
        group: int = 3,
    ):
        super().__init__()
        self.stage_repeats = [4, 8, 4]
        self.model_size = model_size
        if group == 3:
            if model_size == "0.5x":
                self.stage_out_channels = [-1, 12, 120, 240, 480]
            elif model_size == "1.0x":
                self.stage_out_channels = [-1, 24, 240, 480, 960]
            elif model_size == "1.5x":
                self.stage_out_channels = [-1, 24, 360, 720, 1440]
            elif model_size == "2.0x":
                self.stage_out_channels = [-1, 48, 480, 960, 1920]
            else:
                raise NotImplementedError
        elif group == 8:
            if model_size == "0.5x":
                self.stage_out_channels = [-1, 16, 192, 384, 768]
            elif model_size == "1.0x":
                self.stage_out_channels = [-1, 24, 384, 768, 1536]
            elif model_size == "1.5x":
                self.stage_out_channels = [-1, 24, 576, 1152, 2304]
            elif model_size == "2.0x":
                self.stage_out_channels = [-1, 48, 768, 1536, 3072]
            else:
                raise NotImplementedError

        # building first layer
        input_channel = self.stage_out_channels[1]
        self.first_conv = nn.SequentialCell(
            nn.Conv2d(in_channels, input_channel, kernel_size=3, stride=2, pad_mode="pad", padding=1),
            nn.BatchNorm2d(input_channel),
            nn.ReLU(),
        )
        self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")

        features = []
        for idxstage, numrepeat in enumerate(self.stage_repeats):
            output_channel = self.stage_out_channels[idxstage + 2]
            for i in range(numrepeat):
                stride = 2 if i == 0 else 1
                first_group = idxstage == 0 and i == 0
                features.append(ShuffleV1Block(input_channel, output_channel,
                                               group=group, first_group=first_group,
                                               mid_channels=output_channel // 4, stride=stride))
                input_channel = output_channel

        self.features = nn.SequentialCell(features)
        self.global_pool = GlobalAvgPooling()
        self.classifier = nn.Dense(self.stage_out_channels[-1], num_classes, has_bias=False)
        self._initialize_weights()

    def _initialize_weights(self):
        """Initialize weights for cells."""
        for name, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                if "first" in name:
                    cell.weight.set_data(
                        init.initializer(init.Normal(0.01, 0), cell.weight.shape, cell.weight.dtype))
                else:
                    cell.weight.set_data(
                        init.initializer(init.Normal(1.0 / cell.weight.shape[1], 0), cell.weight.shape,
                                         cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Normal(0.01, 0), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.first_conv(x)
        x = self.max_pool(x)
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.global_pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.shufflenetv1.ShuffleV1Block

Bases: nn.Cell

Basic block of ShuffleNetV1. 1x1 GC -> CS -> 3x3 DWC -> 1x1 GC

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
class ShuffleV1Block(nn.Cell):
    """Basic block of ShuffleNetV1. 1x1 GC -> CS -> 3x3 DWC -> 1x1 GC"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        mid_channels: int,
        stride: int,
        group: int,
        first_group: bool,
    ) -> None:
        super().__init__()
        assert stride in [1, 2]
        self.stride = stride
        self.group = group

        if stride == 2:
            out_channels = out_channels - in_channels

        branch_main_1 = [
            # pw
            nn.Conv2d(in_channels, mid_channels, kernel_size=1, stride=1,
                      group=1 if first_group else group),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU(),
        ]

        branch_main_2 = [
            # dw
            nn.Conv2d(mid_channels, mid_channels, kernel_size=3, stride=stride, pad_mode="pad", padding=1,
                      group=mid_channels),
            nn.BatchNorm2d(mid_channels),
            # pw-linear
            nn.Conv2d(mid_channels, out_channels, kernel_size=1, stride=1, group=group),
            nn.BatchNorm2d(out_channels),
        ]
        self.branch_main_1 = nn.SequentialCell(branch_main_1)
        self.branch_main_2 = nn.SequentialCell(branch_main_2)
        if stride == 2:
            self.branch_proj = nn.AvgPool2d(kernel_size=3, stride=2, pad_mode="same")

        self.relu = nn.ReLU()

    def construct(self, x: Tensor) -> Tensor:
        identify = x
        x = self.branch_main_1(x)
        if self.group > 1:
            x = self.channel_shuffle(x)
        x = self.branch_main_2(x)
        if self.stride == 1:
            out = self.relu(identify + x)
        else:
            out = self.relu(ops.concat((self.branch_proj(identify), x), axis=1))

        return out

    def channel_shuffle(self, x: Tensor) -> Tensor:
        batch_size, num_channels, height, width = x.shape

        group_channels = num_channels // self.group
        x = ops.reshape(x, (batch_size, group_channels, self.group, height, width))
        x = ops.transpose(x, (0, 2, 1, 3, 4))
        x = ops.reshape(x, (batch_size, num_channels, height, width))
        return x
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 0.5 and 3 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
226
227
228
229
230
231
232
233
234
235
236
237
@register_model
def shufflenet_v1_g3_x0_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 0.5 and 3 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g3_0.5"]
    model = ShuffleNetV1(group=3, model_size="0.5x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 1.0 and 3 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
240
241
242
243
244
245
246
247
248
249
250
251
@register_model
def shufflenet_v1_g3_x1_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 1.0 and 3 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g3_1.0"]
    model = ShuffleNetV1(group=3, model_size="1.0x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 1.5 and 3 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
254
255
256
257
258
259
260
261
262
263
264
265
@register_model
def shufflenet_v1_g3_x1_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 1.5 and 3 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g3_1.5"]
    model = ShuffleNetV1(group=3, model_size="1.5x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g3_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 2.0 and 3 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
268
269
270
271
272
273
274
275
276
277
278
279
@register_model
def shufflenet_v1_g3_x2_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 2.0 and 3 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g3_2.0"]
    model = ShuffleNetV1(group=3, model_size="2.0x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 0.5 and 8 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
282
283
284
285
286
287
288
289
290
291
292
293
@register_model
def shufflenet_v1_g8_x0_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 0.5 and 8 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g8_0.5"]
    model = ShuffleNetV1(group=8, model_size="0.5x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 1.0 and 8 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
296
297
298
299
300
301
302
303
304
305
306
307
@register_model
def shufflenet_v1_g8_x1_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 1.0 and 8 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g8_1.0"]
    model = ShuffleNetV1(group=8, model_size="1.0x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 1.5 and 8 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
310
311
312
313
314
315
316
317
318
319
320
321
@register_model
def shufflenet_v1_g8_x1_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 1.5 and 8 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g8_1.5"]
    model = ShuffleNetV1(group=8, model_size="1.5x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv1.shufflenet_v1_g8_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV1 model with width scaled by 2.0 and 8 groups of GPConv. Refer to the base class models.ShuffleNetV1 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv1.py
324
325
326
327
328
329
330
331
332
333
334
335
@register_model
def shufflenet_v1_g8_x2_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV1:
    """Get ShuffleNetV1 model with width scaled by 2.0 and 8 groups of GPConv.
    Refer to the base class `models.ShuffleNetV1` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v1_g8_2.0"]
    model = ShuffleNetV1(group=8, model_size="2.0x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv2

MindSpore implementation of ShuffleNetV2. Refer to ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

mindocr.models.backbones.mindcv_models.shufflenetv2.ShuffleNetV2

Bases: nn.Cell

ShuffleNetV2 model class, based on "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" <https://arxiv.org/abs/1807.11164>_

PARAMETER DESCRIPTION
num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number of input channels. Default: 3.

TYPE: int DEFAULT: 3

model_size

scale factor which controls the number of channels. Default: '1.5x'.

TYPE: str DEFAULT: '1.5x'

Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
class ShuffleNetV2(nn.Cell):
    r"""ShuffleNetV2 model class, based on
    `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" <https://arxiv.org/abs/1807.11164>`_

    Args:
        num_classes: number of classification classes. Default: 1000.
        in_channels: number of input channels. Default: 3.
        model_size: scale factor which controls the number of channels. Default: '1.5x'.
    """

    def __init__(
        self,
        num_classes: int = 1000,
        in_channels: int = 3,
        model_size: str = "1.5x",
    ):
        super().__init__()

        self.stage_repeats = [4, 8, 4]
        self.model_size = model_size
        if model_size == "0.5x":
            self.stage_out_channels = [-1, 24, 48, 96, 192, 1024]
        elif model_size == "1.0x":
            self.stage_out_channels = [-1, 24, 116, 232, 464, 1024]
        elif model_size == "1.5x":
            self.stage_out_channels = [-1, 24, 176, 352, 704, 1024]
        elif model_size == "2.0x":
            self.stage_out_channels = [-1, 24, 244, 488, 976, 2048]
        else:
            raise NotImplementedError

        # building first layer
        input_channel = self.stage_out_channels[1]
        self.first_conv = nn.SequentialCell([
            nn.Conv2d(in_channels, input_channel, kernel_size=3, stride=2,
                      pad_mode="pad", padding=1),
            nn.BatchNorm2d(input_channel),
            nn.ReLU(),
        ])
        self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")

        self.features = []
        for idxstage, numrepeat in enumerate(self.stage_repeats):
            output_channel = self.stage_out_channels[idxstage + 2]
            for i in range(numrepeat):
                if i == 0:
                    self.features.append(ShuffleV2Block(input_channel, output_channel,
                                                        mid_channels=output_channel // 2, kernel_size=3, stride=2))
                else:
                    self.features.append(ShuffleV2Block(input_channel // 2, output_channel,
                                                        mid_channels=output_channel // 2, kernel_size=3, stride=1))
                input_channel = output_channel

        self.features = nn.SequentialCell(self.features)

        self.conv_last = nn.SequentialCell([
            nn.Conv2d(input_channel, self.stage_out_channels[-1], kernel_size=1, stride=1),
            nn.BatchNorm2d(self.stage_out_channels[-1]),
            nn.ReLU()
        ])
        self.pool = GlobalAvgPooling()
        self.classifier = nn.Dense(self.stage_out_channels[-1], num_classes, has_bias=False)
        self._initialize_weights()

    def _initialize_weights(self):
        """Initialize weights for cells."""
        for name, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                if "first" in name:
                    cell.weight.set_data(
                        init.initializer(init.Normal(0.01, 0), cell.weight.shape, cell.weight.dtype))
                else:
                    cell.weight.set_data(
                        init.initializer(init.Normal(1.0 / cell.weight.shape[1], 0), cell.weight.shape,
                                         cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Normal(0.01, 0), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.first_conv(x)
        x = self.max_pool(x)
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.conv_last(x)
        x = self.pool(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.shufflenetv2.ShuffleV2Block

Bases: nn.Cell

define the basic block of ShuffleV2

Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
class ShuffleV2Block(nn.Cell):
    """define the basic block of ShuffleV2"""

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        mid_channels: int,
        kernel_size: int,
        stride: int,
    ) -> None:
        super().__init__()
        assert stride in [1, 2]
        self.stride = stride
        pad = kernel_size // 2
        out_channels = out_channels - in_channels
        branch_main = [
            # pw
            nn.Conv2d(in_channels, mid_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU(),
            # dw
            nn.Conv2d(mid_channels, mid_channels, kernel_size=kernel_size, stride=stride,
                      pad_mode="pad", padding=pad, group=mid_channels),
            nn.BatchNorm2d(mid_channels),
            # pw-linear
            nn.Conv2d(mid_channels, out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
        ]
        self.branch_main = nn.SequentialCell(branch_main)

        if stride == 2:
            branch_proj = [
                # dw
                nn.Conv2d(in_channels, in_channels, kernel_size=kernel_size, stride=stride,
                          pad_mode="pad", padding=pad, group=in_channels),
                nn.BatchNorm2d(in_channels),
                # pw-linear
                nn.Conv2d(in_channels, in_channels, kernel_size=1, stride=1),
                nn.BatchNorm2d(in_channels),
                nn.ReLU(),
            ]
            self.branch_proj = nn.SequentialCell(branch_proj)
        else:
            self.branch_proj = None

    def construct(self, old_x: Tensor) -> Tensor:
        if self.stride == 1:
            x_proj, x = self.channel_shuffle(old_x)
            return ops.concat((x_proj, self.branch_main(x)), axis=1)

        if self.stride == 2:
            x_proj = old_x
            x = old_x
            return ops.concat((self.branch_proj(x_proj), self.branch_main(x)), axis=1)
        return None

    @staticmethod
    def channel_shuffle(x: Tensor) -> Tuple[Tensor, Tensor]:
        batch_size, num_channels, height, width = x.shape
        x = ops.reshape(x, (batch_size * num_channels // 2, 2, height * width,))
        x = ops.transpose(x, (1, 0, 2,))
        x = ops.reshape(x, (2, -1, num_channels // 2, height, width,))
        return x[0], x[1]
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x0_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV2 model with width scaled by 0.5. Refer to the base class models.ShuffleNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
220
221
222
223
224
225
226
227
228
229
230
231
@register_model
def shufflenet_v2_x0_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV2:
    """Get ShuffleNetV2 model with width scaled by 0.5.
    Refer to the base class `models.ShuffleNetV2` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v2_0.5"]
    model = ShuffleNetV2(model_size="0.5x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV2 model with width scaled by 1.0. Refer to the base class models.ShuffleNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
234
235
236
237
238
239
240
241
242
243
244
245
@register_model
def shufflenet_v2_x1_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV2:
    """Get ShuffleNetV2 model with width scaled by 1.0.
    Refer to the base class `models.ShuffleNetV2` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v2_1.0"]
    model = ShuffleNetV2(model_size="1.0x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x1_5(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV2 model with width scaled by 1.5. Refer to the base class models.ShuffleNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
248
249
250
251
252
253
254
255
256
257
258
259
@register_model
def shufflenet_v2_x1_5(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV2:
    """Get ShuffleNetV2 model with width scaled by 1.5.
    Refer to the base class `models.ShuffleNetV2` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v2_1.5"]
    model = ShuffleNetV2(model_size="1.5x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.shufflenetv2.shufflenet_v2_x2_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get ShuffleNetV2 model with width scaled by 2.0. Refer to the base class models.ShuffleNetV2 for more details.

Source code in mindocr\models\backbones\mindcv_models\shufflenetv2.py
262
263
264
265
266
267
268
269
270
271
272
273
@register_model
def shufflenet_v2_x2_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ShuffleNetV2:
    """Get ShuffleNetV2 model with width scaled by 2.0.
    Refer to the base class `models.ShuffleNetV2` for more details.
    """
    default_cfg = default_cfgs["shufflenet_v2_2.0"]
    model = ShuffleNetV2(model_size="2.0x", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.sknet

MindSpore implementation of SKNet. Refer to Selective Kernel Networks.

mindocr.models.backbones.mindcv_models.sknet.SKNet

Bases: ResNet

SKNet model class, based on "Selective Kernel Networks" <https://arxiv.org/abs/1903.06586>_

PARAMETER DESCRIPTION
block

block of sknet.

TYPE: Type[nn.Cell]

layers

number of layers of each stage.

TYPE: List[int]

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

groups

number of groups for group conv in blocks. Default: 1.

TYPE: int DEFAULT: 1

base_width

base width of pre group hidden channel in blocks. Default: 64.

TYPE: int DEFAULT: 64

norm

normalization layer in blocks. Default: None.

TYPE: Optional[nn.Cell] DEFAULT: None

sk_kwargs

kwargs of selective kernel. Default: None.

TYPE: Optional[Dict] DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\sknet.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
class SKNet(ResNet):
    r"""SKNet model class, based on
    `"Selective Kernel Networks" <https://arxiv.org/abs/1903.06586>`_

    Args:
        block: block of sknet.
        layers: number of layers of each stage.
        num_classes: number of classification classes. Default: 1000.
        in_channels: number the channels of the input. Default: 3.
        groups: number of groups for group conv in blocks. Default: 1.
        base_width: base width of pre group hidden channel in blocks. Default: 64.
        norm: normalization layer in blocks. Default: None.
        sk_kwargs: kwargs of selective kernel. Default: None.
    """

    def __init__(
        self,
        block: Type[nn.Cell],
        layers: List[int],
        num_classes: int = 1000,
        in_channels: int = 3,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        sk_kwargs: Optional[Dict] = None,
    ) -> None:
        self.sk_kwargs: Optional[Dict] = sk_kwargs  # make pylint happy
        super().__init__(block, layers, num_classes, in_channels, groups, base_width, norm)

    def _make_layer(
        self,
        block: Type[Union[SelectiveKernelBasic, SelectiveKernelBottleneck]],
        channels: int,
        block_nums: int,
        stride: int = 1,
    ) -> nn.SequentialCell:
        down_sample = None

        if stride != 1 or self.input_channels != channels * block.expansion:
            down_sample = nn.SequentialCell([
                nn.Conv2d(self.input_channels, channels * block.expansion, kernel_size=1, stride=stride),
                self.norm(channels * block.expansion)
            ])

        layers = []
        layers.append(
            block(
                self.input_channels,
                channels,
                stride=stride,
                down_sample=down_sample,
                groups=self.groups,
                base_width=self.base_with,
                norm=self.norm,
                sk_kwargs=self.sk_kwargs,
            )
        )
        self.input_channels = channels * block.expansion

        for _ in range(1, block_nums):
            layers.append(
                block(
                    self.input_channels,
                    channels,
                    groups=self.groups,
                    base_width=self.base_with,
                    norm=self.norm,
                    sk_kwargs=self.sk_kwargs,
                )
            )

        return nn.SequentialCell(layers)
mindocr.models.backbones.mindcv_models.sknet.SelectiveKernelBasic

Bases: nn.Cell

build basic block of sknet

Source code in mindocr\models\backbones\mindcv_models\sknet.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
class SelectiveKernelBasic(nn.Cell):
    """build basic block of sknet"""

    expansion = 1

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        stride: int = 1,
        groups: int = 1,
        down_sample: Optional[nn.Cell] = None,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        sk_kwargs: Optional[Dict] = None,
    ):
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d

        if sk_kwargs is None:
            sk_kwargs = {}

        assert groups == 1, "BasicBlock only supports cardinality of 1"
        assert base_width == 64, "BasicBlock doest not support changing base width"

        self.conv1 = SelectiveKernel(
            in_channels, out_channels, stride=stride, **sk_kwargs)
        self.conv2 = nn.SequentialCell([
            nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=3, padding=1, pad_mode="pad"),
            norm(out_channels * self.expansion)
        ])

        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.conv2(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)
        out += identity
        out = self.relu(out)
        return out
mindocr.models.backbones.mindcv_models.sknet.SelectiveKernelBottleneck

Bases: nn.Cell

build the bottleneck of the sknet

Source code in mindocr\models\backbones\mindcv_models\sknet.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
class SelectiveKernelBottleneck(nn.Cell):
    """build the bottleneck of the sknet"""

    expansion = 4

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        stride: int = 1,
        down_sample: Optional[nn.Cell] = None,
        groups: int = 1,
        base_width: int = 64,
        norm: Optional[nn.Cell] = None,
        sk_kwargs: Optional[Dict] = None,
    ):
        super().__init__()
        if norm is None:
            norm = nn.BatchNorm2d

        if sk_kwargs is None:
            sk_kwargs = {}

        width = int(out_channels * (base_width / 64.0)) * groups
        self.conv1 = nn.SequentialCell([
            nn.Conv2d(in_channels, width, kernel_size=1),
            norm(width)
        ])
        self.conv2 = SelectiveKernel(
            width, width, stride=stride, groups=groups, **sk_kwargs)
        self.conv3 = nn.SequentialCell([
            nn.Conv2d(width, out_channels * self.expansion, kernel_size=1),
            norm(out_channels * self.expansion)
        ])

        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.conv2(out)
        out = self.conv3(out)

        if self.down_sample:
            identity = self.down_sample(x)
        out += identity
        out = self.relu(out)
        return out
mindocr.models.backbones.mindcv_models.sknet.skresnet18(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 18 layers SKNet model. Refer to the base class models.SKNet for more details.

Source code in mindocr\models\backbones\mindcv_models\sknet.py
218
219
220
221
222
223
224
225
226
227
228
229
230
231
@register_model
def skresnet18(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ResNet:
    """Get 18 layers SKNet model.
    Refer to the base class `models.SKNet` for more details.
    """
    default_cfg = default_cfgs["skresnet18"]
    sk_kwargs = dict(rd_ratio=1 / 8, rd_divisor=16, split_input=True)
    model = SKNet(SelectiveKernelBasic, [2, 2, 2, 2], num_classes=num_classes, in_channels=in_channels,
                  sk_kwargs=sk_kwargs, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.sknet.skresnet34(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 34 layers SKNet model. Refer to the base class models.SKNet for more details.

Source code in mindocr\models\backbones\mindcv_models\sknet.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
@register_model
def skresnet34(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ResNet:
    """Get 34 layers SKNet model.
    Refer to the base class `models.SKNet` for more details.
    """
    default_cfg = default_cfgs["skresnet34"]
    sk_kwargs = dict(rd_ratio=1 / 8, rd_divisor=16, split_input=True)
    model = SKNet(SelectiveKernelBasic, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels,
                  sk_kwargs=sk_kwargs, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.sknet.skresnet50(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers SKNet model. Refer to the base class models.SKNet for more details.

Source code in mindocr\models\backbones\mindcv_models\sknet.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
@register_model
def skresnet50(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ResNet:
    """Get 50 layers SKNet model.
    Refer to the base class `models.SKNet` for more details.
    """
    default_cfg = default_cfgs["skresnet50"]
    sk_kwargs = dict(split_input=True)
    model = SKNet(SelectiveKernelBottleneck, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels,
                  sk_kwargs=sk_kwargs, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.sknet.skresnext50_32x4d(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 50 layers SKNeXt model with 32 groups of GPConv. Refer to the base class models.SKNet for more details.

Source code in mindocr\models\backbones\mindcv_models\sknet.py
266
267
268
269
270
271
272
273
274
275
276
277
278
279
@register_model
def skresnext50_32x4d(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> ResNet:
    """Get 50 layers SKNeXt model with 32 groups of GPConv.
    Refer to the base class `models.SKNet` for more details.
    """
    default_cfg = default_cfgs["skresnext50_32x4d"]
    sk_kwargs = dict(rd_ratio=1 / 16, rd_divisor=32, split_input=False)
    model = SKNet(SelectiveKernelBottleneck, [3, 4, 6, 3], num_classes=num_classes, in_channels=in_channels,
                  sk_kwargs=sk_kwargs, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.squeezenet

MindSpore implementation of SqueezeNet. Refer to SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.

mindocr.models.backbones.mindcv_models.squeezenet.Fire

Bases: nn.Cell

define the basic block of squeezenet

Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
class Fire(nn.Cell):
    """define the basic block of squeezenet"""

    def __init__(
        self,
        in_channels: int,
        squeeze_channels: int,
        expand1x1_channels: int,
        expand3x3_channels: int,
    ) -> None:
        super().__init__()
        self.squeeze = nn.Conv2d(in_channels, squeeze_channels, kernel_size=1, has_bias=True)
        self.squeeze_activation = nn.ReLU()
        self.expand1x1 = nn.Conv2d(squeeze_channels, expand1x1_channels, kernel_size=1, has_bias=True)
        self.expand1x1_activation = nn.ReLU()
        self.expand3x3 = nn.Conv2d(squeeze_channels, expand3x3_channels, kernel_size=3, pad_mode="same", has_bias=True)
        self.expand3x3_activation = nn.ReLU()

    def construct(self, x: Tensor) -> Tensor:
        x = self.squeeze_activation(self.squeeze(x))
        return ops.concat((self.expand1x1_activation(self.expand1x1(x)),
                           self.expand3x3_activation(self.expand3x3(x))), axis=1)
mindocr.models.backbones.mindcv_models.squeezenet.SqueezeNet

Bases: nn.Cell

SqueezeNet model class, based on "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size" <https://arxiv.org/abs/1602.07360>_ # noqa: E501

.. note:: Important: In contrast to the other models the inception_v3 expects tensors with a size of N x 3 x 227 x 227, so ensure your images are sized accordingly.

PARAMETER DESCRIPTION
version

version of the architecture, '1_0' or '1_1'. Default: '1_0'.

TYPE: str DEFAULT: '1_0'

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

drop_rate

dropout rate of the classifier. Default: 0.5.

TYPE: float DEFAULT: 0.5

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
class SqueezeNet(nn.Cell):
    r"""SqueezeNet model class, based on
    `"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size" <https://arxiv.org/abs/1602.07360>`_  # noqa: E501

    .. note::
        **Important**: In contrast to the other models the inception_v3 expects tensors with a size of
        N x 3 x 227 x 227, so ensure your images are sized accordingly.

    Args:
        version: version of the architecture, '1_0' or '1_1'. Default: '1_0'.
        num_classes: number of classification classes. Default: 1000.
        drop_rate: dropout rate of the classifier. Default: 0.5.
        in_channels: number the channels of the input. Default: 3.
    """

    def __init__(
        self,
        version: str = "1_0",
        num_classes: int = 1000,
        drop_rate: float = 0.5,
        in_channels: int = 3,
    ) -> None:
        super().__init__()
        if version == "1_0":
            self.features = nn.SequentialCell([
                nn.Conv2d(in_channels, 96, kernel_size=7, stride=2, pad_mode="valid", has_bias=True),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=3, stride=2),
                Fire(96, 16, 64, 64),
                Fire(128, 16, 64, 64),
                Fire(128, 32, 128, 128),
                nn.MaxPool2d(kernel_size=3, stride=2),
                Fire(256, 32, 128, 128),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                nn.MaxPool2d(kernel_size=3, stride=2),
                Fire(512, 64, 256, 256),
            ])
        elif version == "1_1":
            self.features = nn.SequentialCell([
                nn.Conv2d(in_channels, 64, kernel_size=3, stride=2, padding=1, pad_mode="pad", has_bias=True),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=3, stride=2),
                Fire(64, 16, 64, 64),
                Fire(128, 16, 64, 64),
                nn.MaxPool2d(kernel_size=3, stride=2),
                Fire(128, 32, 128, 128),
                Fire(256, 32, 128, 128),
                nn.MaxPool2d(kernel_size=3, stride=2),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                Fire(512, 64, 256, 256),
            ])
        else:
            raise ValueError(f"Unsupported SqueezeNet version {version}: 1_0 or 1_1 expected")

        self.final_conv = nn.Conv2d(512, num_classes, kernel_size=1, has_bias=True)
        self.classifier = nn.SequentialCell([
            nn.Dropout(keep_prob=1 - drop_rate),
            self.final_conv,
            nn.ReLU(),
            GlobalAvgPooling()
        ])
        self._initialize_weights()

    def _initialize_weights(self):
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                if cell is self.final_conv:
                    cell.weight.set_data(init.initializer(init.Normal(), cell.weight.shape, cell.weight.dtype))
                else:
                    cell.weight.set_data(init.initializer(init.HeUniform(), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.squeezenet.squeezenet1_0(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get SqueezeNet model of version 1.0. Refer to the base class models.SqueezeNet for more details.

Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
152
153
154
155
156
157
158
159
160
161
162
163
@register_model
def squeezenet1_0(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> SqueezeNet:
    """Get SqueezeNet model of version 1.0.
    Refer to the base class `models.SqueezeNet` for more details.
    """
    default_cfg = default_cfgs["squeezenet1_0"]
    model = SqueezeNet(version="1_0", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.squeezenet.squeezenet1_1(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get SqueezeNet model of version 1.1. Refer to the base class models.SqueezeNet for more details.

Source code in mindocr\models\backbones\mindcv_models\squeezenet.py
166
167
168
169
170
171
172
173
174
175
176
177
@register_model
def squeezenet1_1(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> SqueezeNet:
    """Get SqueezeNet model of version 1.1.
    Refer to the base class `models.SqueezeNet` for more details.
    """
    default_cfg = default_cfgs["squeezenet1_1"]
    model = SqueezeNet(version="1_1", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.swin_transformer

Define SwinTransformer model

mindocr.models.backbones.mindcv_models.swin_transformer.BasicLayer

Bases: nn.Cell

A basic Swin Transformer layer for one stage.

PARAMETER DESCRIPTION
dim

Number of input channels.

TYPE: int

input_resolution

Input resolution.

TYPE: tuple[int]

depth

Number of blocks.

TYPE: int

num_heads

Number of attention heads.

TYPE: int

window_size

Local window size.

TYPE: int

mlp_ratio

Ratio of mlp hidden dim to embedding dim.

TYPE: float DEFAULT: 4.0

qkv_bias

If True, add a learnable bias to query, key, value. Default: True

TYPE: bool DEFAULT: True

qk_scale

Override default qk scale of head_dim ** -0.5 if set.

TYPE: float | None DEFAULT: None

drop

Dropout rate. Default: 0.0

TYPE: float DEFAULT: 0.0

attn_drop

Attention dropout rate. Default: 0.0

TYPE: float DEFAULT: 0.0

drop_path

Stochastic depth rate. Default: 0.0

TYPE: float | tuple[float] DEFAULT: 0.0

norm_layer

Normalization layer. Default: nn.LayerNorm

TYPE: nn.Cell DEFAULT: nn.LayerNorm

downsample

Downsample layer at the end of the layer. Default: None

TYPE: nn.Cell | None DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
class BasicLayer(nn.Cell):
    """A basic Swin Transformer layer for one stage.

    Args:
        dim (int): Number of input channels.
        input_resolution (tuple[int]): Input resolution.
        depth (int): Number of blocks.
        num_heads (int): Number of attention heads.
        window_size (int): Local window size.
        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
        drop (float, optional): Dropout rate. Default: 0.0
        attn_drop (float, optional): Attention dropout rate. Default: 0.0
        drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0
        norm_layer (nn.Cell, optional): Normalization layer. Default: nn.LayerNorm
        downsample (nn.Cell | None, optional): Downsample layer at the end of the layer. Default: None
    """

    def __init__(
        self,
        dim: int,
        input_resolution: Tuple[int],
        depth: int,
        num_heads: int,
        window_size: int,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = True,
        qk_scale: Optional[float] = None,
        drop: float = 0.0,
        attn_drop: float = 0.0,
        drop_path: Optional[float] = 0.0,
        norm_layer: Optional[nn.Cell] = nn.LayerNorm,
        downsample: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        self.dim = dim
        self.input_resolution = input_resolution
        self.depth = depth

        # build blocks
        self.blocks = nn.CellList([
            SwinTransformerBlock(dim=dim, input_resolution=input_resolution,
                                 num_heads=num_heads, window_size=window_size,
                                 shift_size=0 if (i % 2 == 0) else window_size // 2,  # TODO: 这里window_size//2的时候特别慢
                                 mlp_ratio=mlp_ratio,
                                 qkv_bias=qkv_bias, qk_scale=qk_scale,
                                 drop=drop, attn_drop=attn_drop,
                                 drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
                                 norm_layer=norm_layer)
            for i in range(depth)])

        # patch merging layer
        if downsample is not None:
            self.downsample = downsample(input_resolution, dim=dim, norm_layer=norm_layer)
        else:
            self.downsample = None

    def construct(self, x: Tensor) -> Tensor:
        for blk in self.blocks:
            x = blk(x)
        if self.downsample is not None:
            x = self.downsample(x)
        return x

    def extra_repr(self) -> str:
        return f"dim={self.dim}, input_resolution={self.input_resolution}, depth={self.depth}"
mindocr.models.backbones.mindcv_models.swin_transformer.PatchEmbed

Bases: nn.Cell

Image to Patch Embedding

PARAMETER DESCRIPTION
image_size

Image size. Default: 224.

TYPE: int DEFAULT: 224

patch_size

Patch token size. Default: 4.

TYPE: int DEFAULT: 4

in_chans

Number of input image channels. Default: 3.

TYPE: int DEFAULT: 3

embed_dim

Number of linear projection output channels. Default: 96.

TYPE: int DEFAULT: 96

norm_layer

Normalization layer. Default: None

TYPE: nn.Cell DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
class PatchEmbed(nn.Cell):
    """Image to Patch Embedding

    Args:
        image_size (int): Image size.  Default: 224.
        patch_size (int): Patch token size. Default: 4.
        in_chans (int): Number of input image channels. Default: 3.
        embed_dim (int): Number of linear projection output channels. Default: 96.
        norm_layer (nn.Cell, optional): Normalization layer. Default: None
    """

    def __init__(
        self,
        image_size: int = 224,
        patch_size: int = 4,
        in_chans: int = 3,
        embed_dim: int = 96,
        norm_layer: Optional[nn.Cell] = None,
    ) -> None:
        super().__init__()
        image_size = to_2tuple(image_size)
        patch_size = to_2tuple(patch_size)
        patches_resolution = [image_size[0] // patch_size[0], image_size[1] // patch_size[1]]
        self.image_size = image_size
        self.patch_size = patch_size
        self.patches_resolution = patches_resolution
        self.num_patches = patches_resolution[0] * patches_resolution[1]

        self.in_chans = in_chans
        self.embed_dim = embed_dim

        self.proj = nn.Conv2d(in_channels=in_chans, out_channels=embed_dim, kernel_size=patch_size, stride=patch_size,
                              pad_mode="pad", has_bias=True, weight_init="TruncatedNormal")

        if norm_layer is not None:
            if isinstance(embed_dim, int):
                embed_dim = (embed_dim,)
            self.norm = norm_layer(embed_dim, epsilon=1e-5)
        else:
            self.norm = None

    def construct(self, x: Tensor) -> Tensor:
        b = x.shape[0]
        # FIXME look at relaxing size constraints
        x = ops.reshape(self.proj(x), (b, self.embed_dim, -1))  # b Ph*Pw c
        x = ops.transpose(x, (0, 2, 1))

        if self.norm is not None:
            x = self.norm(x)
        return x
mindocr.models.backbones.mindcv_models.swin_transformer.PatchMerging

Bases: nn.Cell

Patch Merging Layer.

PARAMETER DESCRIPTION
input_resolution

Resolution of input feature.

TYPE: tuple[int]

dim

Number of input channels.

TYPE: int

norm_layer

Normalization layer. Default: nn.LayerNorm

TYPE: nn.Module DEFAULT: nn.LayerNorm

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
class PatchMerging(nn.Cell):
    """Patch Merging Layer.

    Args:
        input_resolution (tuple[int]): Resolution of input feature.
        dim (int): Number of input channels.
        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm
    """

    def __init__(
        self,
        input_resolution: Tuple[int],
        dim: int,
        norm_layer: Optional[nn.Cell] = nn.LayerNorm,
    ) -> None:
        super().__init__()
        self.input_resolution = input_resolution
        self.dim = dim[0] if isinstance(dim, tuple) and len(dim) == 1 else dim
        # Default False
        self.reduction = nn.Dense(in_channels=4 * dim, out_channels=2 * dim, has_bias=False)
        self.norm = norm_layer([dim * 4, ])
        self.H, self.W = self.input_resolution
        self.H_2, self.W_2 = self.H // 2, self.W // 2
        self.H2W2 = int(self.H * self.W // 4)
        self.dim_mul_4 = int(dim * 4)
        self.H2W2 = int(self.H * self.W // 4)

    def construct(self, x: Tensor) -> Tensor:
        """
        x: B, H*W, C
        """
        b = x.shape[0]
        x = ops.reshape(x, (b, self.H_2, 2, self.W_2, 2, self.dim))
        x = ops.transpose(x, (0, 1, 3, 4, 2, 5))
        x = ops.reshape(x, (b, self.H2W2, self.dim_mul_4))
        x = self.norm(x)
        x = self.reduction(x)

        return x

    def extra_repr(self) -> str:
        return f"input_resolution={self.input_resolution}, dim={self.dim}"
mindocr.models.backbones.mindcv_models.swin_transformer.PatchMerging.construct(x)
Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
426
427
428
429
430
431
432
433
434
435
436
437
def construct(self, x: Tensor) -> Tensor:
    """
    x: B, H*W, C
    """
    b = x.shape[0]
    x = ops.reshape(x, (b, self.H_2, 2, self.W_2, 2, self.dim))
    x = ops.transpose(x, (0, 1, 3, 4, 2, 5))
    x = ops.reshape(x, (b, self.H2W2, self.dim_mul_4))
    x = self.norm(x)
    x = self.reduction(x)

    return x
mindocr.models.backbones.mindcv_models.swin_transformer.SwinTransformer

Bases: nn.Cell

SwinTransformer model class, based on "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" <https://arxiv.org/pdf/2103.14030>_

PARAMETER DESCRIPTION
image_size

Input image size. Default 224

TYPE: int | tuple(int DEFAULT: 224

patch_size

Patch size. Default: 4

TYPE: int | tuple(int DEFAULT: 4

in_chans

Number of input image channels. Default: 3

TYPE: int DEFAULT: 3

num_classes

Number of classes for classification head. Default: 1000

TYPE: int DEFAULT: 1000

embed_dim

Patch embedding dimension. Default: 96

TYPE: int DEFAULT: 96

depths

Depth of each Swin Transformer layer.

TYPE: tuple(int DEFAULT: None

num_heads

Number of attention heads in different layers.

TYPE: tuple(int DEFAULT: None

window_size

Window size. Default: 7

TYPE: int DEFAULT: 7

mlp_ratio

Ratio of mlp hidden dim to embedding dim. Default: 4

TYPE: float DEFAULT: 4.0

qkv_bias

If True, add a learnable bias to query, key, value. Default: True

TYPE: bool DEFAULT: True

qk_scale

Override default qk scale of head_dim ** -0.5 if set. Default: None

TYPE: float DEFAULT: None

drop_rate

Dropout rate. Default: 0

TYPE: float DEFAULT: 0.0

attn_drop_rate

Attention dropout rate. Default: 0

TYPE: float DEFAULT: 0.0

drop_path_rate

Stochastic depth rate. Default: 0.1

TYPE: float DEFAULT: 0.1

norm_layer

Normalization layer. Default: nn.LayerNorm.

TYPE: nn.Cell DEFAULT: nn.LayerNorm

ape

If True, add absolute position embedding to the patch embedding. Default: False

TYPE: bool DEFAULT: False

patch_norm

If True, add normalization after patch embedding. Default: True

TYPE: bool DEFAULT: True

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
class SwinTransformer(nn.Cell):
    r"""SwinTransformer model class, based on
    `"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" <https://arxiv.org/pdf/2103.14030>`_

    Args:
        image_size (int | tuple(int)): Input image size. Default 224
        patch_size (int | tuple(int)): Patch size. Default: 4
        in_chans (int): Number of input image channels. Default: 3
        num_classes (int): Number of classes for classification head. Default: 1000
        embed_dim (int): Patch embedding dimension. Default: 96
        depths (tuple(int)): Depth of each Swin Transformer layer.
        num_heads (tuple(int)): Number of attention heads in different layers.
        window_size (int): Window size. Default: 7
        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4
        qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True
        qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. Default: None
        drop_rate (float): Dropout rate. Default: 0
        attn_drop_rate (float): Attention dropout rate. Default: 0
        drop_path_rate (float): Stochastic depth rate. Default: 0.1
        norm_layer (nn.Cell): Normalization layer. Default: nn.LayerNorm.
        ape (bool): If True, add absolute position embedding to the patch embedding. Default: False
        patch_norm (bool): If True, add normalization after patch embedding. Default: True
    """

    def __init__(
        self,
        image_size: int = 224,
        patch_size: int = 4,
        in_chans: int = 3,
        num_classes: int = 1000,
        embed_dim: int = 96,
        depths: Optional[List[int]] = None,
        num_heads: Optional[List[int]] = None,
        window_size: int = 7,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = True,
        qk_scale: Optional[int] = None,
        drop_rate: float = 0.0,
        attn_drop_rate: float = 0.0,
        drop_path_rate: float = 0.1,
        norm_layer: Optional[nn.Cell] = nn.LayerNorm,
        ape: bool = False,
        patch_norm: bool = True,
    ) -> None:
        super().__init__()

        self.num_classes = num_classes
        self.num_layers = len(depths)
        self.embed_dim = embed_dim
        self.ape = ape
        self.patch_norm = patch_norm
        self.num_features = int(embed_dim * 2 ** (self.num_layers - 1))
        self.mlp_ratio = mlp_ratio

        # split image into non-overlapping patches
        self.patch_embed = PatchEmbed(
            image_size=image_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,
            norm_layer=norm_layer if self.patch_norm else None)
        num_patches = self.patch_embed.num_patches
        patches_resolution = self.patch_embed.patches_resolution
        self.patches_resolution = patches_resolution

        # absolute position embedding
        if self.ape:
            self.absolute_pos_embed = Parameter(Tensor(np.zeros(1, num_patches, embed_dim), dtype=mstype.float32))

        self.pos_drop = nn.Dropout(keep_prob=1.0 - drop_rate)

        # stochastic depth
        dpr = [x for x in np.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule

        # build layers
        self.layers = nn.CellList()
        for i_layer in range(self.num_layers):
            layer = BasicLayer(dim=int(embed_dim * 2 ** i_layer),
                               input_resolution=(patches_resolution[0] // (2 ** i_layer),
                                                 patches_resolution[1] // (2 ** i_layer)),
                               depth=depths[i_layer],
                               num_heads=num_heads[i_layer],
                               window_size=window_size,
                               mlp_ratio=self.mlp_ratio,
                               qkv_bias=qkv_bias, qk_scale=qk_scale,
                               drop=drop_rate, attn_drop=attn_drop_rate,
                               drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],
                               norm_layer=norm_layer,
                               downsample=PatchMerging if (i_layer < self.num_layers - 1) else None)
            self.layers.append(layer)

        self.norm = norm_layer([self.num_features, ], epsilon=1e-5)
        self.classifier = nn.Dense(in_channels=self.num_features,
                                   out_channels=num_classes, has_bias=True) if num_classes > 0 else Identity()
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(init.initializer(init.TruncatedNormal(sigma=0.02),
                                                      cell.weight.shape, cell.weight.dtype))
                if isinstance(cell, nn.Dense) and cell.bias is not None:
                    cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.LayerNorm):
                cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))
                cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))

    def no_weight_decay(self) -> None:
        return {"absolute_pos_embed"}

    def no_weight_decay_keywords(self) -> None:
        return {"relative_position_bias_table"}

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.classifier(x)
        return x

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.patch_embed(x)
        if self.ape:
            x = x + self.absolute_pos_embed
        x = self.pos_drop(x)
        for layer in self.layers:
            x = layer(x)
        x = self.norm(x)  # B L C
        x = ops.mean(ops.transpose(x, (0, 2, 1)), 2)  # B C 1
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.swin_transformer.SwinTransformerBlock

Bases: nn.Cell

Swin Transformer Block.

PARAMETER DESCRIPTION
dim

Number of input channels.

TYPE: int

input_resolution

Input resolution.

TYPE: tuple[int]

num_heads

Number of attention heads.

TYPE: int

window_size

Window size.

TYPE: int DEFAULT: 7

shift_size

Shift size for SW-MSA.

TYPE: int DEFAULT: 0

mlp_ratio

Ratio of mlp hidden dim to embedding dim.

TYPE: float DEFAULT: 4.0

qkv_bias

If True, add a learnable bias to query, key, value. Default: True

TYPE: bool DEFAULT: True

qk_scale

Override default qk scale of head_dim ** -0.5 if set.

TYPE: float | None DEFAULT: None

drop

Dropout rate. Default: 0.0

TYPE: float DEFAULT: 0.0

attn_drop

Attention dropout rate. Default: 0.0

TYPE: float DEFAULT: 0.0

drop_path

Stochastic depth rate. Default: 0.0

TYPE: float DEFAULT: 0.0

act_layer

Activation layer. Default: nn.GELU

TYPE: nn.Cell DEFAULT: nn.GELU

norm_layer

Normalization layer. Default: nn.LayerNorm

TYPE: nn.Cell DEFAULT: nn.LayerNorm

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
class SwinTransformerBlock(nn.Cell):
    """Swin Transformer Block.

    Args:
        dim (int): Number of input channels.
        input_resolution (tuple[int]): Input resolution.
        num_heads (int): Number of attention heads.
        window_size (int): Window size.
        shift_size (int): Shift size for SW-MSA.
        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
        drop (float, optional): Dropout rate. Default: 0.0
        attn_drop (float, optional): Attention dropout rate. Default: 0.0
        drop_path (float, optional): Stochastic depth rate. Default: 0.0
        act_layer (nn.Cell, optional): Activation layer. Default: nn.GELU
        norm_layer (nn.Cell, optional): Normalization layer.  Default: nn.LayerNorm
    """

    def __init__(
        self,
        dim: int,
        input_resolution: Tuple[int],
        num_heads: int,
        window_size: int = 7,
        shift_size: int = 0,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = True,
        qk_scale: Optional[float] = None,
        drop: float = 0.0,
        attn_drop: float = 0.0,
        drop_path: float = 0.0,
        act_layer: Optional[nn.Cell] = nn.GELU,
        norm_layer: Optional[nn.Cell] = nn.LayerNorm,
    ) -> None:
        super(SwinTransformerBlock, self).__init__()
        self.dim = dim
        self.input_resolution = input_resolution
        self.num_heads = num_heads
        self.window_size = window_size
        self.shift_size = shift_size
        self.mlp_ratio = mlp_ratio
        if min(self.input_resolution) <= self.window_size:
            # if window size is larger than input resolution, we don't partition windows
            self.shift_size = 0
            self.window_size = min(self.input_resolution)

        if isinstance(dim, int):
            dim = (dim,)

        self.norm1 = norm_layer(dim, epsilon=1e-5)
        self.attn = WindowAttention(
            dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,
            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)

        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
        self.norm2 = norm_layer(dim, epsilon=1e-5)
        mlp_hidden_dim = int((dim[0] if isinstance(dim, tuple) else dim) * mlp_ratio)
        self.mlp = Mlp(in_features=dim[0] if isinstance(dim, tuple) else dim, hidden_features=mlp_hidden_dim,
                       act_layer=act_layer, drop=drop)
        if self.shift_size > 0:
            # calculate attention mask for SW-MSA
            h_, w_ = self.input_resolution
            img_mask = np.zeros((1, h_, w_, 1))  # 1 H W 1
            h_slices = (slice(0, -self.window_size),
                        slice(-self.window_size, -self.shift_size),
                        slice(-self.shift_size, None))
            w_slices = (slice(0, -self.window_size),
                        slice(-self.window_size, -self.shift_size),
                        slice(-self.shift_size, None))
            cnt = 0
            for h in h_slices:
                for w in w_slices:
                    img_mask[:, h, w, :] = cnt
                    cnt += 1
            # img_mask: [1, 56, 56, 1] window_size: 7
            mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1
            mask_windows = mask_windows.reshape(-1, self.window_size * self.window_size)
            attn_mask = mask_windows[:, np.newaxis] - mask_windows[:, :, np.newaxis]
            # [64, 49, 49] ==> [1, 64, 1, 49, 49]
            attn_mask = np.expand_dims(attn_mask, axis=1)
            attn_mask = np.expand_dims(attn_mask, axis=0)
            attn_mask = Tensor(np.where(attn_mask == 0, 0.0, -100.0), dtype=mstype.float32)
            self.attn_mask = Parameter(attn_mask, requires_grad=False)
            self.roll_pos = Roll(self.shift_size)
            self.roll_neg = Roll(-self.shift_size)
        else:
            self.attn_mask = None

        self.window_partition = WindowPartition(self.window_size)
        self.window_reverse = WindowReverse()

    def construct(self, x: Tensor) -> Tensor:
        h, w = self.input_resolution
        b, _, c = x.shape

        shortcut = x
        x = self.norm1(x)
        x = ops.reshape(x, (b, h, w, c,))

        # cyclic shift
        if self.shift_size > 0:
            shifted_x = self.roll_neg(x)
            # shifted_x = numpy.roll(x, (-self.shift_size, -self.shift_size), (1, 2))
        else:
            shifted_x = x

        # partition windows
        x_windows = self.window_partition(shifted_x)  # nW*B, window_size, window_size, C
        x_windows = ops.reshape(x_windows,
                                (-1, self.window_size * self.window_size, c,))  # nW*B, window_size*window_size, C

        # W-MSA/SW-MSA
        attn_windows = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C

        # merge windows
        attn_windows = ops.reshape(attn_windows, (-1, self.window_size, self.window_size, c,))
        shifted_x = self.window_reverse(attn_windows, self.window_size, h, w)  # B H' W' C

        # reverse cyclic shift
        if self.shift_size > 0:
            x = self.roll_pos(shifted_x)
        else:
            x = shifted_x

        x = ops.reshape(x, (b, h * w, c,))

        # FFN
        x = shortcut + self.drop_path(x)

        x = x + self.drop_path(self.mlp(self.norm2(x)))

        return x

    def extra_repr(self) -> str:
        return f"dim={self.dim}, input_resolution={self.input_resolution}, num_heads={self.num_heads}, " \
               f"window_size={self.window_size}, shift_size={self.shift_size}, mlp_ratio={self.mlp_ratio}"
mindocr.models.backbones.mindcv_models.swin_transformer.WindowAttention

Bases: nn.Cell

Window based multi-head self attention (W-MSA) Cell with relative position bias. It supports both of shifted and non-shifted window.

PARAMETER DESCRIPTION
dim

Number of input channels.

TYPE: int

window_size

The height and width of the window.

TYPE: tuple[int]

num_heads

Number of attention heads.

TYPE: int

qkv_bias

If True, add a learnable bias to query, key, value. Default: True

TYPE: bool DEFAULT: True

qZk_scale

Override default qk scale of head_dim ** -0.5 if set

TYPE: float | None

attn_drop

Dropout ratio of attention weight. Default: 0.0

TYPE: float DEFAULT: 0.0

proj_drop

Dropout ratio of output. Default: 0.0

TYPE: float DEFAULT: 0.0

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
class WindowAttention(nn.Cell):
    r"""Window based multi-head self attention (W-MSA) Cell with relative position bias.
    It supports both of shifted and non-shifted window.

    Args:
        dim (int): Number of input channels.
        window_size (tuple[int]): The height and width of the window.
        num_heads (int): Number of attention heads.
        qkv_bias (bool, optional):  If True, add a learnable bias to query, key, value. Default: True
        qZk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set
        attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0
        proj_drop (float, optional): Dropout ratio of output. Default: 0.0
    """

    def __init__(
        self,
        dim: int,
        window_size: int,
        num_heads: int,
        qkv_bias: bool = True,
        qk_scale: Optional[float] = None,
        attn_drop: float = 0.0,
        proj_drop: float = 0.0,
    ) -> None:
        super().__init__()
        if isinstance(dim, tuple) and len(dim) == 1:
            dim = dim[0]
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = Tensor(qk_scale or head_dim**-0.5, mstype.float32)
        self.relative_bias = RelativeBias(self.window_size, num_heads)

        # get pair-wise relative position index for each token inside the window
        self.q = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.k = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)
        self.v = nn.Dense(in_channels=dim, out_channels=dim, has_bias=qkv_bias)

        self.attn_drop = nn.Dropout(keep_prob=1.0 - attn_drop)
        self.proj = nn.Dense(in_channels=dim, out_channels=dim, has_bias=True)
        self.proj_drop = nn.Dropout(keep_prob=1.0 - proj_drop)
        self.softmax = nn.Softmax(axis=-1)
        self.batch_matmul = ops.BatchMatMul()

    def construct(self, x: Tensor, mask: Optional[Tensor] = None) -> Tensor:
        """
        Args:
            x: input features with shape of (num_windows*B, N, C)
            mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
        """
        b_, n, c = x.shape
        q = ops.reshape(self.q(x), (b_, n, self.num_heads, c // self.num_heads)) * self.scale
        q = ops.transpose(q, (0, 2, 1, 3))
        k = ops.reshape(self.k(x), (b_, n, self.num_heads, c // self.num_heads))
        k = ops.transpose(k, (0, 2, 3, 1))
        v = ops.reshape(self.v(x), (b_, n, self.num_heads, c // self.num_heads))
        v = ops.transpose(v, (0, 2, 1, 3))

        attn = self.batch_matmul(q, k)
        attn = attn + self.relative_bias()

        if mask is not None:
            nw = mask.shape[1]
            attn = ops.reshape(attn, (b_ // nw, nw, self.num_heads, n, n,)) + mask
            attn = ops.reshape(attn, (-1, self.num_heads, n, n,))
            attn = self.softmax(attn)
        else:
            attn = self.softmax(attn)
        attn = self.attn_drop(attn)
        x = ops.reshape(ops.transpose(self.batch_matmul(attn, v), (0, 2, 1, 3)), (b_, n, c))
        x = self.proj(x)
        x = self.proj_drop(x)
        return x

    def extra_repr(self) -> str:
        return f"dim={self.dim}, window_size={self.window_size}, num_heads={self.num_heads}"
mindocr.models.backbones.mindcv_models.swin_transformer.WindowAttention.construct(x, mask=None)
PARAMETER DESCRIPTION
x

input features with shape of (num_windows*B, N, C)

TYPE: Tensor

mask

(0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None

TYPE: Optional[Tensor] DEFAULT: None

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
def construct(self, x: Tensor, mask: Optional[Tensor] = None) -> Tensor:
    """
    Args:
        x: input features with shape of (num_windows*B, N, C)
        mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
    """
    b_, n, c = x.shape
    q = ops.reshape(self.q(x), (b_, n, self.num_heads, c // self.num_heads)) * self.scale
    q = ops.transpose(q, (0, 2, 1, 3))
    k = ops.reshape(self.k(x), (b_, n, self.num_heads, c // self.num_heads))
    k = ops.transpose(k, (0, 2, 3, 1))
    v = ops.reshape(self.v(x), (b_, n, self.num_heads, c // self.num_heads))
    v = ops.transpose(v, (0, 2, 1, 3))

    attn = self.batch_matmul(q, k)
    attn = attn + self.relative_bias()

    if mask is not None:
        nw = mask.shape[1]
        attn = ops.reshape(attn, (b_ // nw, nw, self.num_heads, n, n,)) + mask
        attn = ops.reshape(attn, (-1, self.num_heads, n, n,))
        attn = self.softmax(attn)
    else:
        attn = self.softmax(attn)
    attn = self.attn_drop(attn)
    x = ops.reshape(ops.transpose(self.batch_matmul(attn, v), (0, 2, 1, 3)), (b_, n, c))
    x = self.proj(x)
    x = self.proj_drop(x)
    return x
mindocr.models.backbones.mindcv_models.swin_transformer.WindowPartition

Bases: nn.Cell

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
class WindowPartition(nn.Cell):
    def __init__(
        self,
        window_size: int,
    ) -> None:
        super(WindowPartition, self).__init__()

        self.window_size = window_size

    def construct(self, x: Tensor) -> Tensor:
        """
        Args:
            x: (b, h, w, c)
            window_size (int): window size

        Returns:
            windows: Tensor(num_windows*b, window_size, window_size, c)
        """
        b, h, w, c = x.shape
        x = ops.reshape(x, (b, h // self.window_size, self.window_size, w // self.window_size, self.window_size, c))
        x = ops.transpose(x, (0, 1, 3, 2, 4, 5))
        x = ops.reshape(x, (b * h * w // (self.window_size**2), self.window_size, self.window_size, c))

        return x
mindocr.models.backbones.mindcv_models.swin_transformer.WindowPartition.construct(x)
PARAMETER DESCRIPTION
x

(b, h, w, c)

TYPE: Tensor

window_size

window size

TYPE: int

RETURNS DESCRIPTION
windows

Tensor(num_windows*b, window_size, window_size, c)

TYPE: Tensor

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def construct(self, x: Tensor) -> Tensor:
    """
    Args:
        x: (b, h, w, c)
        window_size (int): window size

    Returns:
        windows: Tensor(num_windows*b, window_size, window_size, c)
    """
    b, h, w, c = x.shape
    x = ops.reshape(x, (b, h // self.window_size, self.window_size, w // self.window_size, self.window_size, c))
    x = ops.transpose(x, (0, 1, 3, 2, 4, 5))
    x = ops.reshape(x, (b * h * w // (self.window_size**2), self.window_size, self.window_size, c))

    return x
mindocr.models.backbones.mindcv_models.swin_transformer.WindowReverse

Bases: nn.Cell

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
class WindowReverse(nn.Cell):
    def construct(
        self,
        windows: Tensor,
        window_size: int,
        h: int,
        w: int,
    ) -> Tensor:
        """
        Args:
            windows: (num_windows*B, window_size, window_size, C)
            window_size (int): Window size
            h (int): Height of image
            w (int): Width of image

        Returns:
            x: (B, H, W, C)
        """
        b = windows.shape[0] // (h * w // window_size // window_size)
        x = ops.reshape(windows, (b, h // window_size, w // window_size, window_size, window_size, -1))
        x = ops.transpose(x, (0, 1, 3, 2, 4, 5))
        x = ops.reshape(x, (b, h, w, -1))
        return x
mindocr.models.backbones.mindcv_models.swin_transformer.WindowReverse.construct(windows, window_size, h, w)
PARAMETER DESCRIPTION
windows

(num_windows*B, window_size, window_size, C)

TYPE: Tensor

window_size

Window size

TYPE: int

h

Height of image

TYPE: int

w

Width of image

TYPE: int

RETURNS DESCRIPTION
x

(B, H, W, C)

TYPE: Tensor

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def construct(
    self,
    windows: Tensor,
    window_size: int,
    h: int,
    w: int,
) -> Tensor:
    """
    Args:
        windows: (num_windows*B, window_size, window_size, C)
        window_size (int): Window size
        h (int): Height of image
        w (int): Width of image

    Returns:
        x: (B, H, W, C)
    """
    b = windows.shape[0] // (h * w // window_size // window_size)
    x = ops.reshape(windows, (b, h // window_size, w // window_size, window_size, window_size, -1))
    x = ops.transpose(x, (0, 1, 3, 2, 4, 5))
    x = ops.reshape(x, (b, h, w, -1))
    return x
mindocr.models.backbones.mindcv_models.swin_transformer.swin_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get SwinTransformer tiny model. Refer to the base class 'models.SwinTransformer' for more details.

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
@register_model
def swin_tiny(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> SwinTransformer:
    """Get SwinTransformer tiny model.
    Refer to the base class 'models.SwinTransformer' for more details.
    """
    default_cfg = default_cfgs["swin_tiny"]
    model = SwinTransformer(image_size=224, patch_size=4, in_chans=in_channels, num_classes=num_classes,
                            embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7,
                            mlp_ratio=4., qkv_bias=True, qk_scale=None,
                            drop_rate=0., attn_drop_rate=0., drop_path_rate=0.2,
                            norm_layer=nn.LayerNorm, ape=False, patch_norm=True, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.swin_transformer.window_partition(x, window_size)
PARAMETER DESCRIPTION
x

(B, H, W, C)

window_size

window size

TYPE: int

RETURNS DESCRIPTION
windows

numpy(num_windows*B, window_size, window_size, C)

Source code in mindocr\models\backbones\mindcv_models\swin_transformer.py
64
65
66
67
68
69
70
71
72
73
74
75
76
def window_partition(x, window_size: int):
    """
    Args:
        x: (B, H, W, C)
        window_size (int): window size

    Returns:
        windows: numpy(num_windows*B, window_size, window_size, C)
    """
    b, h, w, c = x.shape
    x = np.reshape(x, (b, h // window_size, window_size, w // window_size, window_size, c))
    windows = x.transpose(0, 1, 3, 2, 4, 5).reshape(-1, window_size, window_size, c)
    return windows
mindocr.models.backbones.mindcv_models.utils

Some utils while building models

mindocr.models.backbones.mindcv_models.utils.ConfigDict

Bases: dict

dot.notation access to dictionary attributes

Source code in mindocr\models\backbones\mindcv_models\utils.py
21
22
23
24
25
26
class ConfigDict(dict):
    """dot.notation access to dictionary attributes"""

    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__
mindocr.models.backbones.mindcv_models.utils.auto_map(model, param_dict)

Raname part of the param_dict such that names from checkpoint and model are consistent

Source code in mindocr\models\backbones\mindcv_models\utils.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def auto_map(model, param_dict):
    """Raname part of the param_dict such that names from checkpoint and model are consistent"""
    updated_param_dict = deepcopy(param_dict)
    net_param = model.get_parameters()
    ckpt_param = list(updated_param_dict.keys())
    remap = {}
    for param in net_param:
        if param.name not in ckpt_param:
            print('Cannot find a param to load: ', param.name)
            poss = difflib.get_close_matches(param.name, ckpt_param, n=3, cutoff=0.6)
            if len(poss) > 0:
                print('=> Find most matched param: ', poss[0], ', loaded')
                updated_param_dict[param.name] = updated_param_dict.pop(poss[0])  # replace
                remap[param.name] = poss[0]
            else:
                raise ValueError('Cannot find any matching param from: ', ckpt_param)

    if remap != {}:
        print('WARNING: Auto mapping succeed. Please check the found mapping names to ensure correctness')
        print('\tNet Param\t<---\tCkpt Param')
        for k in remap:
            print(f'\t{k}\t<---\t{remap[k]}')
    return updated_param_dict
mindocr.models.backbones.mindcv_models.utils.download_pretrained(default_cfg)

Download the pretrained ckpt from url to local path

Source code in mindocr\models\backbones\mindcv_models\utils.py
29
30
31
32
33
34
35
36
37
38
39
def download_pretrained(default_cfg):
    """Download the pretrained ckpt from url to local path"""
    if "url" not in default_cfg or not default_cfg["url"]:
        logging.warning("Pretrained model URL is invalid")
        return

    # download files
    download_path = get_checkpoint_download_root()
    os.makedirs(download_path, exist_ok=True)
    file_path = DownLoad().download_url(default_cfg["url"], path=download_path)
    return file_path
mindocr.models.backbones.mindcv_models.utils.load_pretrained(model, default_cfg, num_classes=1000, in_channels=3, filter_fn=None, auto_mapping=False)

load pretrained model depending on cfgs of model

Source code in mindocr\models\backbones\mindcv_models\utils.py
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
def load_pretrained(model, default_cfg, num_classes=1000, in_channels=3, filter_fn=None, auto_mapping=False):
    """load pretrained model depending on cfgs of model"""
    file_path = download_pretrained(default_cfg)

    try:
        param_dict = load_checkpoint(file_path)
    except Exception:
        print(f'ERROR: Fails to load the checkpoint. Please check whether the checkpoint is downloaded successfully as '
              f'`{file_path}` and is not zero-byte. You may try to manually download the checkpoint from ',
              default_cfg["url"])
        param_dict = dict()

    if auto_mapping:
        param_dict = auto_map(model, param_dict)

    if in_channels == 1:
        conv1_name = default_cfg["first_conv"]
        logging.info("Converting first conv (%s) from 3 to 1 channel", conv1_name)
        con1_weight = param_dict[conv1_name + ".weight"]
        con1_weight.set_data(con1_weight.sum(axis=1, keepdims=True), slice_shape=True)
    elif in_channels != 3:
        raise ValueError("Invalid in_channels for pretrained weights")

    if 'classifier' in default_cfg:
        classifier_name = default_cfg["classifier"]
        if num_classes == 1000 and default_cfg["num_classes"] == 1001:
            classifier_weight = param_dict[classifier_name + ".weight"]
            classifier_weight.set_data(classifier_weight[:1000], slice_shape=True)
            classifier_bias = param_dict[classifier_name + ".bias"]
            classifier_bias.set_data(classifier_bias[:1000], slice_shape=True)
        elif num_classes != default_cfg["num_classes"]:
            params_names = list(param_dict.keys())
            param_dict.pop(
                _search_param_name(params_names, classifier_name + ".weight"),
                "No Parameter {} in ParamDict".format(classifier_name + ".weight"),
            )
            param_dict.pop(
                _search_param_name(params_names, classifier_name + ".bias"),
                "No Parameter {} in ParamDict".format(classifier_name + ".bias"),
            )

    if filter_fn is not None:
        param_dict = filter_fn(param_dict)

    load_param_into_net(model, param_dict)

    print('INFO: Finish loading model checkpoint from: ', file_path)
mindocr.models.backbones.mindcv_models.utils.make_divisible(v, divisor, min_value=None)

Find the smallest integer larger than v and divisible by divisor.

Source code in mindocr\models\backbones\mindcv_models\utils.py
116
117
118
119
120
121
122
123
124
125
126
127
128
def make_divisible(
    v: float,
    divisor: int,
    min_value: Optional[int] = None,
) -> int:
    """Find the smallest integer larger than v and divisible by divisor."""
    if not min_value:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v
mindocr.models.backbones.mindcv_models.vgg

MindSpore implementation of VGGNet. Refer to SqueezeNet: Very Deep Convolutional Networks for Large-Scale Image Recognition.

mindocr.models.backbones.mindcv_models.vgg.VGG

Bases: nn.Cell

VGGNet model class, based on "Very Deep Convolutional Networks for Large-Scale Image Recognition" <https://arxiv.org/abs/1409.1556>_

PARAMETER DESCRIPTION
model_name

name of the architecture. 'vgg11', 'vgg13', 'vgg16' or 'vgg19'.

TYPE: str

batch_norm

use batch normalization or not. Default: False.

TYPE: bool DEFAULT: False

num_classes

number of classification classes. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

number the channels of the input. Default: 3.

TYPE: int DEFAULT: 3

drop_rate

dropout rate of the classifier. Default: 0.5.

TYPE: float DEFAULT: 0.5

Source code in mindocr\models\backbones\mindcv_models\vgg.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
class VGG(nn.Cell):
    r"""VGGNet model class, based on
    `"Very Deep Convolutional Networks for Large-Scale Image Recognition" <https://arxiv.org/abs/1409.1556>`_

    Args:
        model_name: name of the architecture. 'vgg11', 'vgg13', 'vgg16' or 'vgg19'.
        batch_norm: use batch normalization or not. Default: False.
        num_classes: number of classification classes. Default: 1000.
        in_channels: number the channels of the input. Default: 3.
        drop_rate: dropout rate of the classifier. Default: 0.5.
    """

    def __init__(
        self,
        model_name: str,
        batch_norm: bool = False,
        num_classes: int = 1000,
        in_channels: int = 3,
        drop_rate: float = 0.5,
    ) -> None:
        super().__init__()
        cfg = cfgs[model_name]
        self.features = _make_layers(cfg, batch_norm=batch_norm, in_channels=in_channels)
        self.flatten = nn.Flatten()
        self.classifier = nn.SequentialCell([
            nn.Dense(512 * 7 * 7, 4096),
            nn.ReLU(),
            nn.Dropout(keep_prob=1 - drop_rate),
            nn.Dense(4096, 4096),
            nn.ReLU(),
            nn.Dropout(keep_prob=1 - drop_rate),
            nn.Dense(4096, num_classes),
        ])
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        """Initialize weights for cells."""
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(
                    init.initializer(init.HeNormal(math.sqrt(5), mode="fan_out", nonlinearity="relu"),
                                     cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(
                        init.initializer("zeros", cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.Dense):
                cell.weight.set_data(
                    init.initializer(init.Normal(0.01), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(init.initializer("zeros", cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.features(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.flatten(x)
        x = self.classifier(x)
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.vgg.vgg11(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 11 layers VGG model. Refer to the base class models.VGG for more details.

Source code in mindocr\models\backbones\mindcv_models\vgg.py
137
138
139
140
141
142
143
144
145
146
147
148
@register_model
def vgg11(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> VGG:
    """Get 11 layers VGG model.
    Refer to the base class `models.VGG` for more details.
    """
    default_cfg = default_cfgs["vgg11"]
    model = VGG(model_name="vgg11", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.vgg.vgg13(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 13 layers VGG model. Refer to the base class models.VGG for more details.

Source code in mindocr\models\backbones\mindcv_models\vgg.py
151
152
153
154
155
156
157
158
159
160
161
162
@register_model
def vgg13(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> VGG:
    """Get 13 layers VGG model.
    Refer to the base class `models.VGG` for more details.
    """
    default_cfg = default_cfgs["vgg13"]
    model = VGG(model_name="vgg13", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.vgg.vgg16(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 16 layers VGG model. Refer to the base class models.VGG for more details.

Source code in mindocr\models\backbones\mindcv_models\vgg.py
165
166
167
168
169
170
171
172
173
174
175
176
@register_model
def vgg16(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> VGG:
    """Get 16 layers VGG model.
    Refer to the base class `models.VGG` for more details.
    """
    default_cfg = default_cfgs["vgg16"]
    model = VGG(model_name="vgg16", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.vgg.vgg19(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get 19 layers VGG model. Refer to the base class models.VGG for more details.

Source code in mindocr\models\backbones\mindcv_models\vgg.py
179
180
181
182
183
184
185
186
187
188
189
190
@register_model
def vgg19(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> VGG:
    """Get 19 layers VGG model.
    Refer to the base class `models.VGG` for more details.
    """
    default_cfg = default_cfgs["vgg19"]
    model = VGG(model_name="vgg19", num_classes=num_classes, in_channels=in_channels, **kwargs)

    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.visformer

MindSpore implementation of Visformer. Refer to: Visformer: The Vision-friendly Transformer

mindocr.models.backbones.mindcv_models.visformer.Attention

Bases: nn.Cell

Attention layer

Source code in mindocr\models\backbones\mindcv_models\visformer.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
class Attention(nn.Cell):
    """Attention layer"""

    def __init__(
        self,
        dim: int,
        num_heads: int = 8,
        head_dim_ratio: float = 1.0,
        qkv_bias: bool = False,
        qk_scale: float = None,
        attn_drop: float = 0.0,
        proj_drop: float = 0.0,
    ) -> None:
        super(Attention, self).__init__()
        self.dim = dim
        self.num_heads = num_heads
        head_dim = round(dim // num_heads * head_dim_ratio)
        self.head_dim = head_dim

        qk_scale_factor = qk_scale if qk_scale is not None else -0.25
        self.scale = head_dim**qk_scale_factor

        self.qkv = nn.Conv2d(dim, head_dim * num_heads * 3, 1, 1, pad_mode="pad", padding=0, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(1 - attn_drop)
        self.proj = nn.Conv2d(self.head_dim * self.num_heads, dim, 1, 1, pad_mode="pad", padding=0)
        self.proj_drop = nn.Dropout(1 - proj_drop)

    def construct(self, x: Tensor) -> Tensor:
        B, C, H, W = x.shape
        x = self.qkv(x)
        qkv = ops.reshape(x, (B, 3, self.num_heads, self.head_dim, H * W))
        qkv = qkv.transpose((1, 0, 2, 4, 3))
        q, k, v = qkv[0], qkv[1], qkv[2]
        attn = ops.matmul(q * self.scale, k.transpose(0, 1, 3, 2) * self.scale)
        attn = ops.Softmax(axis=-1)(attn)
        attn = self.attn_drop(attn)
        x = ops.matmul(attn, v)

        x = x.transpose((0, 1, 3, 2)).reshape((B, -1, H, W))
        x = self.proj(x)
        x = self.proj_drop(x)

        return x
mindocr.models.backbones.mindcv_models.visformer.Block

Bases: nn.Cell

visformer basic block

Source code in mindocr\models\backbones\mindcv_models\visformer.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
class Block(nn.Cell):
    """visformer basic block"""

    def __init__(
        self,
        dim: int,
        num_heads: int,
        head_dim_ratio: float = 1.0,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = False,
        qk_scale: float = None,
        drop: float = 0.0,
        attn_drop: float = 0.0,
        drop_path: float = 0.0,
        act_layer: nn.Cell = nn.GELU,
        group: int = 8,
        attn_disabled: bool = False,
        spatial_conv: bool = False,
    ) -> None:
        super(Block, self).__init__()
        self.attn_disabled = attn_disabled
        self.spatial_conv = spatial_conv
        self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
        if not attn_disabled:
            self.norm1 = nn.BatchNorm2d(dim)
            self.attn = Attention(dim, num_heads=num_heads, head_dim_ratio=head_dim_ratio, qkv_bias=qkv_bias,
                                  qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)

        self.norm2 = nn.BatchNorm2d(dim)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop,
                       group=group, spatial_conv=spatial_conv)

    def construct(self, x: Tensor) -> Tensor:
        if not self.attn_disabled:
            x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
mindocr.models.backbones.mindcv_models.visformer.Mlp

Bases: nn.Cell

MLP layer

Source code in mindocr\models\backbones\mindcv_models\visformer.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class Mlp(nn.Cell):
    """MLP layer"""

    def __init__(
        self,
        in_features: int,
        hidden_features: int = None,
        out_features: int = None,
        act_layer: nn.Cell = nn.GELU,
        drop: float = 0.0,
        group: int = 8,
        spatial_conv: bool = False,
    ) -> None:
        super(Mlp, self).__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.in_features = in_features
        self.out_features = out_features
        self.spatial_conv = spatial_conv
        if self.spatial_conv:
            if group < 2:
                hidden_features = in_features * 5 // 6
            else:
                hidden_features = in_features * 2
        self.hidden_features = hidden_features
        self.group = group
        self.drop = nn.Dropout(1 - drop)
        self.conv1 = nn.Conv2d(in_features, hidden_features, 1, 1, pad_mode="pad", padding=0)
        self.act1 = act_layer()
        if self.spatial_conv:
            self.conv2 = nn.Conv2d(hidden_features, hidden_features, 3, 1, pad_mode="pad", padding=1, group=self.group)
            self.act2 = act_layer()
        self.conv3 = nn.Conv2d(hidden_features, out_features, 1, 1, pad_mode="pad", padding=0)

    def construct(self, x: Tensor) -> Tensor:
        x = self.conv1(x)
        x = self.act1(x)
        x = self.drop(x)

        if self.spatial_conv:
            x = self.conv2(x)
            x = self.act2(x)

        x = self.conv3(x)
        x = self.drop(x)
        return x
mindocr.models.backbones.mindcv_models.visformer.Visformer

Bases: nn.Cell

Visformer model class, based on '"Visformer: The Vision-friendly Transformer" https://arxiv.org/pdf/2104.12533.pdf'

PARAMETER DESCRIPTION
image_size

images input size. Default: 224.

TYPE: int)

number

32.

TYPE: the channels of the input. Default

num_classes

number of classification classes. Default: 1000.

TYPE: int) DEFAULT: 1000

embed_dim

embedding dimension in all head. Default: 384.

TYPE: int) DEFAULT: 384

depth

model block depth. Default: None.

TYPE: int) DEFAULT: None

num_heads

number of heads. Default: None.

TYPE: int) DEFAULT: None

mlp_ratio

ratio of hidden features in Mlp. Default: 4.

TYPE: float) DEFAULT: 4.0

qkv_bias

have bias in qkv layers or not. Default: False.

TYPE: bool) DEFAULT: False

qk_scale

Override default qk scale of head_dim ** -0.5 if set.

TYPE: float) DEFAULT: None

drop_rate

dropout rate. Default: 0.

TYPE: float) DEFAULT: 0.0

attn_drop_rate

attention layers dropout rate. Default: 0.

TYPE: float) DEFAULT: 0.0

drop_path_rate

drop path rate. Default: 0.1.

TYPE: float) DEFAULT: 0.1

attn_stage

block will have a attention layer if value = '1' else not. Default: '1111'.

TYPE: str) DEFAULT: '1111'

pos_embed

position embedding. Default: True.

TYPE: bool) DEFAULT: True

spatial_conv

block will have a spatial convolution layer if value = '1' else not. Default: '1111'.

TYPE: str) DEFAULT: '1111'

group

convolution group. Default: 8.

TYPE: int) DEFAULT: 8

pool

if true will use global_pooling else not. Default: True.

TYPE: bool) DEFAULT: True

conv_init

if true will init convolution weights else not. Default: False.

DEFAULT: False

Source code in mindocr\models\backbones\mindcv_models\visformer.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
class Visformer(nn.Cell):
    r"""Visformer model class, based on
    '"Visformer: The Vision-friendly Transformer"
    <https://arxiv.org/pdf/2104.12533.pdf>'

    Args:
        image_size (int) : images input size. Default: 224.
        number the channels of the input. Default: 32.
        num_classes (int) : number of classification classes. Default: 1000.
        embed_dim (int) : embedding dimension in all head. Default: 384.
        depth (int) : model block depth. Default: None.
        num_heads (int) : number of heads. Default: None.
        mlp_ratio (float) : ratio of hidden features in Mlp. Default: 4.
        qkv_bias (bool) : have bias in qkv layers or not. Default: False.
        qk_scale (float) : Override default qk scale of head_dim ** -0.5 if set.
        drop_rate (float) : dropout rate. Default: 0.
        attn_drop_rate (float) : attention layers dropout rate. Default: 0.
        drop_path_rate (float) : drop path rate. Default: 0.1.
        attn_stage (str) : block will have a attention layer if value = '1' else not. Default: '1111'.
        pos_embed (bool) : position embedding. Default: True.
        spatial_conv (str) : block will have a spatial convolution layer if value = '1' else not. Default: '1111'.
        group (int) : convolution group. Default: 8.
        pool (bool) : if true will use global_pooling else not. Default: True.
        conv_init : if true will init convolution weights else not. Default: False.
    """

    def __init__(
        self,
        img_size: int = 224,
        init_channels: int = 32,
        num_classes: int = 1000,
        embed_dim: int = 384,
        depth: List[int] = None,
        num_heads: List[int] = None,
        mlp_ratio: float = 4.0,
        qkv_bias: bool = False,
        qk_scale: float = None,
        drop_rate: float = 0.0,
        attn_drop_rate: float = 0.0,
        drop_path_rate: float = 0.1,
        attn_stage: str = "1111",
        pos_embed: bool = True,
        spatial_conv: str = "1111",
        group: int = 8,
        pool: bool = True,
        conv_init: bool = False,
    ) -> None:
        super(Visformer, self).__init__()
        self.num_classes = num_classes
        self.num_features = self.embed_dim = embed_dim
        self.init_channels = init_channels
        self.img_size = img_size
        self.pool = pool
        self.conv_init = conv_init
        self.depth = depth
        assert (isinstance(depth, list) or isinstance(depth, tuple)) and len(depth) == 4
        if not (isinstance(num_heads, list) or isinstance(num_heads, tuple)):
            num_heads = [num_heads] * 4

        self.pos_embed = pos_embed
        dpr = np.linspace(0, drop_path_rate, sum(depth)).tolist()

        self.stem = nn.SequentialCell([
            nn.Conv2d(3, self.init_channels, 7, 2, pad_mode="pad", padding=3),
            nn.BatchNorm2d(self.init_channels),
            nn.ReLU()
        ])
        img_size //= 2

        self.pos_drop = nn.Dropout(1 - drop_rate)
        # stage0
        if depth[0]:
            self.patch_embed0 = PatchEmbed(img_size=img_size, patch_size=2, in_chans=self.init_channels,
                                           embed_dim=embed_dim // 4)
            img_size //= 2
            if self.pos_embed:
                self.pos_embed0 = mindspore.Parameter(
                    ops.zeros((1, embed_dim // 4, img_size, img_size), mindspore.float32))
            self.stage0 = nn.CellList([
                Block(dim=embed_dim // 4, num_heads=num_heads[0], head_dim_ratio=0.25, mlp_ratio=mlp_ratio,
                      qkv_bias=qkv_bias, qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                      group=group, attn_disabled=(attn_stage[0] == "0"), spatial_conv=(spatial_conv[0] == "1"))
                for i in range(depth[0])
            ])

        # stage1
        if depth[0]:
            self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=2, in_chans=embed_dim // 4,
                                           embed_dim=embed_dim // 2)
            img_size //= 2
        else:
            self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=4, in_chans=self.init_channels,
                                           embed_dim=embed_dim // 2)
            img_size //= 4

        if self.pos_embed:
            self.pos_embed1 = mindspore.Parameter(ops.zeros((1, embed_dim // 2, img_size, img_size), mindspore.float32))

        self.stage1 = nn.CellList([
            Block(
                dim=embed_dim // 2, num_heads=num_heads[1], head_dim_ratio=0.5, mlp_ratio=mlp_ratio,
                qkv_bias=qkv_bias, qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                group=group, attn_disabled=(attn_stage[1] == "0"), spatial_conv=(spatial_conv[1] == "1")
            )
            for i in range(sum(depth[:1]), sum(depth[:2]))
        ])

        # stage2
        self.patch_embed2 = PatchEmbed(img_size=img_size, patch_size=2, in_chans=embed_dim // 2, embed_dim=embed_dim)
        img_size //= 2
        if self.pos_embed:
            self.pos_embed2 = mindspore.Parameter(ops.zeros((1, embed_dim, img_size, img_size), mindspore.float32))
        self.stage2 = nn.CellList([
            Block(
                dim=embed_dim, num_heads=num_heads[2], head_dim_ratio=1.0, mlp_ratio=mlp_ratio,
                qkv_bias=qkv_bias, qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                group=group, attn_disabled=(attn_stage[2] == "0"), spatial_conv=(spatial_conv[2] == "1")
            )
            for i in range(sum(depth[:2]), sum(depth[:3]))
        ])

        # stage3
        self.patch_embed3 = PatchEmbed(img_size=img_size, patch_size=2, in_chans=embed_dim, embed_dim=embed_dim * 2)
        img_size //= 2
        if self.pos_embed:
            self.pos_embed3 = mindspore.Parameter(ops.zeros((1, embed_dim * 2, img_size, img_size), mindspore.float32))
        self.stage3 = nn.CellList([
            Block(
                dim=embed_dim * 2, num_heads=num_heads[3], head_dim_ratio=1.0, mlp_ratio=mlp_ratio,
                qkv_bias=qkv_bias, qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                group=group, attn_disabled=(attn_stage[3] == "0"), spatial_conv=(spatial_conv[3] == "1")
            )
            for i in range(sum(depth[:3]), sum(depth[:4]))
        ])

        # head
        if self.pool:
            self.global_pooling = GlobalAvgPooling()

        self.norm = nn.BatchNorm2d(embed_dim * 2)
        self.head = nn.Dense(embed_dim * 2, num_classes)

        # weight init
        if self.pos_embed:
            if depth[0]:
                self.pos_embed0.set_data(initializer(TruncatedNormal(0.02),
                                                     self.pos_embed0.shape, self.pos_embed0.dtype))
            self.pos_embed1.set_data(initializer(TruncatedNormal(0.02),
                                                 self.pos_embed1.shape, self.pos_embed1.dtype))
            self.pos_embed2.set_data(initializer(TruncatedNormal(0.02),
                                                 self.pos_embed2.shape, self.pos_embed2.dtype))
            self.pos_embed3.set_data(initializer(TruncatedNormal(0.02),
                                                 self.pos_embed3.shape, self.pos_embed3.dtype))
        self._initialize_weights()

    def _initialize_weights(self) -> None:
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.weight.set_data(initializer(TruncatedNormal(0.02), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(initializer(Constant(0), cell.bias.shape, cell.bias.dtype))
            elif isinstance(cell, nn.LayerNorm):
                cell.beta.set_data(initializer(Constant(0), cell.beta.shape, cell.beta.dtype))
                cell.gamma.set_data(initializer(Constant(1), cell.gamma.shape, cell.gamma.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.beta.set_data(initializer(Constant(0), cell.beta.shape, cell.beta.dtype))
                cell.gamma.set_data(initializer(Constant(1), cell.gamma.shape, cell.gamma.dtype))
            elif isinstance(cell, nn.Conv2d):
                if self.conv_init:
                    cell.weight.set_data(initializer(HeNormal(mode="fan_out", nonlinearity="relu"), cell.weight.shape,
                                                     cell.weight.dtype))
                else:
                    cell.weight.set_data(initializer(TruncatedNormal(0.02), cell.weight.shape, cell.weight.dtype))
                if cell.bias is not None:
                    cell.bias.set_data(initializer(Constant(0), cell.bias.shape, cell.bias.dtype))

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.stem(x)

        # stage 0
        if self.depth[0]:
            x = self.patch_embed0(x)
            if self.pos_embed:
                x = x + self.pos_embed0
                x = self.pos_drop(x)
            for b in self.stage0:
                x = b(x)

        # stage 1
        x = self.patch_embed1(x)
        if self.pos_embed:
            x = x + self.pos_embed1
            x = self.pos_drop(x)
        for b in self.stage1:
            x = b(x)

        # stage 2
        x = self.patch_embed2(x)
        if self.pos_embed:
            x = x + self.pos_embed2
            x = self.pos_drop(x)
        for b in self.stage2:
            x = b(x)

        # stage 3
        x = self.patch_embed3(x)
        if self.pos_embed:
            x = x + self.pos_embed3
            x = self.pos_drop(x)
        for b in self.stage3:
            x = b(x)
        x = self.norm(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        # head
        if self.pool:
            x = self.global_pooling(x)
        else:
            x = x[:, :, 0, 0]
        x = self.head(x.view(x.shape[0], -1))
        return x

    def construct(self, x: Tensor) -> Tensor:
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.visformer.visformer_small(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get visformer small model. Refer to the base class 'models.visformer' for more details.

Source code in mindocr\models\backbones\mindcv_models\visformer.py
467
468
469
470
471
472
473
474
475
476
477
478
@register_model
def visformer_small(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs):
    """Get visformer small model.
    Refer to the base class 'models.visformer' for more details.
    """
    default_cfg = default_cfgs["visformer_small"]
    model = Visformer(img_size=224, init_channels=32, num_classes=num_classes, embed_dim=384,
                      depth=[0, 7, 4, 4], num_heads=[6, 6, 6, 6], mlp_ratio=4., group=8,
                      attn_stage="0011", spatial_conv="1100", conv_init=True, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.visformer.visformer_small_v2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get visformer small2 model. Refer to the base class 'models.visformer' for more details.

Source code in mindocr\models\backbones\mindcv_models\visformer.py
481
482
483
484
485
486
487
488
489
490
491
492
@register_model
def visformer_small_v2(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs):
    """Get visformer small2 model.
    Refer to the base class 'models.visformer' for more details.
    """
    default_cfg = default_cfgs["visformer_small_v2"]
    model = Visformer(img_size=224, init_channels=32, num_classes=num_classes, embed_dim=256,
                      depth=[1, 10, 14, 3], num_heads=[2, 4, 8, 16], mlp_ratio=4., qk_scale=-0.5,
                      group=8, attn_stage="0011", spatial_conv="1100", conv_init=True, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.visformer.visformer_tiny(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get visformer tiny model. Refer to the base class 'models.visformer' for more details.

Source code in mindocr\models\backbones\mindcv_models\visformer.py
438
439
440
441
442
443
444
445
446
447
448
449
450
@register_model
def visformer_tiny(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs):
    """Get visformer tiny model.
    Refer to the base class 'models.visformer' for more details.
    """
    default_cfg = default_cfgs["visformer_tiny"]
    model = Visformer(img_size=224, init_channels=16, num_classes=num_classes, embed_dim=192,
                      depth=[0, 7, 4, 4], num_heads=[3, 3, 3, 3], mlp_ratio=4., group=8,
                      attn_stage="0011", spatial_conv="1100", drop_path_rate=0.03, conv_init=True, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_models.visformer.visformer_tiny_v2(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get visformer tiny2 model. Refer to the base class 'models.visformer' for more details.

Source code in mindocr\models\backbones\mindcv_models\visformer.py
453
454
455
456
457
458
459
460
461
462
463
464
@register_model
def visformer_tiny_v2(pretrained: bool = False, num_classes: int = 1000, in_channels: int = 3, **kwargs):
    """Get visformer tiny2 model.
    Refer to the base class 'models.visformer' for more details.
    """
    default_cfg = default_cfgs["visformer_tiny_v2"]
    model = Visformer(img_size=224, init_channels=24, num_classes=num_classes, embed_dim=192,
                      depth=[1, 4, 6, 3], num_heads=[1, 3, 6, 12], mlp_ratio=4., qk_scale=-0.5, group=8,
                      attn_stage="0011", spatial_conv="1100", drop_path_rate=0.03, conv_init=True, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg, num_classes=num_classes, in_channels=in_channels)
    return model
mindocr.models.backbones.mindcv_models.vit

ViT

mindocr.models.backbones.mindcv_models.vit.Attention

Bases: nn.Cell

Attention layer implementation, Rearrange Input -> B x N x hidden size.

PARAMETER DESCRIPTION
dim

The dimension of input features.

TYPE: int

num_heads

The number of attention heads. Default: 8.

TYPE: int DEFAULT: 8

keep_prob

The keep rate, greater than 0 and less equal than 1. Default: 1.0.

TYPE: float DEFAULT: 1.0

attention_keep_prob

The keep rate for attention. Default: 1.0.

TYPE: float DEFAULT: 1.0

RETURNS DESCRIPTION

Tensor, output tensor.

Examples:

>>> ops = Attention(768, 12)
Source code in mindocr\models\backbones\mindcv_models\vit.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
class Attention(nn.Cell):
    """
    Attention layer implementation, Rearrange Input -> B x N x hidden size.

    Args:
        dim (int): The dimension of input features.
        num_heads (int): The number of attention heads. Default: 8.
        keep_prob (float): The keep rate, greater than 0 and less equal than 1. Default: 1.0.
        attention_keep_prob (float): The keep rate for attention. Default: 1.0.

    Returns:
        Tensor, output tensor.

    Examples:
        >>> ops = Attention(768, 12)
    """

    def __init__(
        self,
        dim: int,
        num_heads: int = 8,
        keep_prob: float = 1.0,
        attention_keep_prob: float = 1.0,
    ):
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = Tensor(head_dim**-0.5)

        self.qkv = nn.Dense(dim, dim * 3)
        self.attn_drop = nn.Dropout(attention_keep_prob)
        self.out = nn.Dense(dim, dim)
        self.out_drop = nn.Dropout(keep_prob)

        self.mul = ops.Mul()
        self.reshape = ops.Reshape()
        self.transpose = ops.Transpose()
        self.unstack = ops.Unstack(axis=0)
        self.attn_matmul_v = ops.BatchMatMul()
        self.q_matmul_k = ops.BatchMatMul(transpose_b=True)
        self.softmax = nn.Softmax(axis=-1)

    def construct(self, x):
        """Attention construct."""
        b, n, c = x.shape
        qkv = self.qkv(x)
        qkv = self.reshape(qkv, (b, n, 3, self.num_heads, c // self.num_heads))
        qkv = self.transpose(qkv, (2, 0, 3, 1, 4))
        q, k, v = self.unstack(qkv)

        attn = self.q_matmul_k(q, k)
        attn = self.mul(attn, self.scale)
        attn = self.softmax(attn)
        attn = self.attn_drop(attn)

        out = self.attn_matmul_v(attn, v)
        out = self.transpose(out, (0, 2, 1, 3))
        out = self.reshape(out, (b, n, c))
        out = self.out(out)
        out = self.out_drop(out)

        return out
mindocr.models.backbones.mindcv_models.vit.Attention.construct(x)

Attention construct.

Source code in mindocr\models\backbones\mindcv_models\vit.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
def construct(self, x):
    """Attention construct."""
    b, n, c = x.shape
    qkv = self.qkv(x)
    qkv = self.reshape(qkv, (b, n, 3, self.num_heads, c // self.num_heads))
    qkv = self.transpose(qkv, (2, 0, 3, 1, 4))
    q, k, v = self.unstack(qkv)

    attn = self.q_matmul_k(q, k)
    attn = self.mul(attn, self.scale)
    attn = self.softmax(attn)
    attn = self.attn_drop(attn)

    out = self.attn_matmul_v(attn, v)
    out = self.transpose(out, (0, 2, 1, 3))
    out = self.reshape(out, (b, n, c))
    out = self.out(out)
    out = self.out_drop(out)

    return out
mindocr.models.backbones.mindcv_models.vit.BaseClassifier

Bases: nn.Cell

generate classifier to combine the backbone and head

Source code in mindocr\models\backbones\mindcv_models\vit.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
class BaseClassifier(nn.Cell):
    """
    generate classifier to combine the backbone and head
    """

    def __init__(self, backbone, neck=None, head=None):
        super().__init__()
        self.backbone = backbone
        if neck:
            self.neck = neck
            self.with_neck = True
        else:
            self.with_neck = False
        if head:
            self.head = head
            self.with_head = True
        else:
            self.with_head = False

    def forward_features(self, x: Tensor) -> Tensor:
        x = self.backbone(x)
        return x

    def forward_head(self, x: Tensor) -> Tensor:
        x = self.head(x)
        return x

    def construct(self, x):
        x = self.forward_features(x)
        if self.with_neck:
            x = self.neck(x)
        if self.with_head:
            x = self.forward_head(x)
        return x
mindocr.models.backbones.mindcv_models.vit.DenseHead

Bases: nn.Cell

LinearClsHead architecture.

PARAMETER DESCRIPTION
input_channel

The number of input channel.

TYPE: int

num_classes

Number of classes.

TYPE: int

has_bias

Specifies whether the layer uses a bias vector. Default: True.

TYPE: bool DEFAULT: True

activation

activate function applied to the output. Eg. ReLU. Default: None.

TYPE: Union[str, Cell, Primitive] DEFAULT: None

keep_prob

Dropout keeping rate, between [0, 1]. E.g. rate=0.9, means dropping out 10% of input. Default: 1.0.

TYPE: float DEFAULT: 1.0

RETURNS DESCRIPTION

Tensor, output tensor.

Source code in mindocr\models\backbones\mindcv_models\vit.py
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
class DenseHead(nn.Cell):
    """
    LinearClsHead architecture.

    Args:
        input_channel (int): The number of input channel.
        num_classes (int): Number of classes.
        has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
        activation (Union[str, Cell, Primitive]): activate function applied to the output. Eg. `ReLU`. Default: None.
        keep_prob (float): Dropout keeping rate, between [0, 1]. E.g. rate=0.9, means dropping out 10% of input.
            Default: 1.0.

    Returns:
        Tensor, output tensor.
    """

    def __init__(
        self,
        input_channel: int,
        num_classes: int,
        has_bias: bool = True,
        activation: Optional[Union[str, nn.Cell]] = None,
        keep_prob: float = 1.0,
    ) -> None:
        super().__init__()

        self.dropout = nn.Dropout(keep_prob)
        self.classifier = nn.Dense(input_channel, num_classes, has_bias=has_bias, activation=activation)

    def construct(self, x):
        if self.training:
            x = self.dropout(x)
        x = self.classifier(x)
        return x
mindocr.models.backbones.mindcv_models.vit.DropPath

Bases: nn.Cell

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Source code in mindocr\models\backbones\mindcv_models\vit.py
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
class DropPath(nn.Cell):
    """
    Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
    """

    def __init__(self, keep_prob=None, seed=0):
        super().__init__()
        self.keep_prob = 1 - keep_prob
        seed = min(seed, 0)
        self.rand = P.UniformReal(seed=seed)
        self.shape = P.Shape()
        self.floor = P.Floor()

    def construct(self, x):
        if self.training:
            x_shape = self.shape(x)
            random_tensor = self.rand((x_shape[0], 1, 1))
            random_tensor = random_tensor + self.keep_prob
            random_tensor = self.floor(random_tensor)
            x = x / self.keep_prob
            x = x * random_tensor

        return x
mindocr.models.backbones.mindcv_models.vit.FeedForward

Bases: nn.Cell

Feed Forward layer implementation.

PARAMETER DESCRIPTION
in_features

The dimension of input features.

TYPE: int

hidden_features

The dimension of hidden features. Default: None.

TYPE: int DEFAULT: None

out_features

The dimension of output features. Default: None

TYPE: int DEFAULT: None

activation

Activation function which will be stacked on top of the

TYPE: nn.Cell DEFAULT: nn.GELU

normalization

nn.GELU.

TYPE: layer (if not None), otherwise on top of the conv layer. Default

keep_prob

The keep rate, greater than 0 and less equal than 1. Default: 1.0.

TYPE: float DEFAULT: 1.0

RETURNS DESCRIPTION

Tensor, output tensor.

Examples:

>>> ops = FeedForward(768, 3072)
Source code in mindocr\models\backbones\mindcv_models\vit.py
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
class FeedForward(nn.Cell):
    """
    Feed Forward layer implementation.

    Args:
        in_features (int): The dimension of input features.
        hidden_features (int): The dimension of hidden features. Default: None.
        out_features (int): The dimension of output features. Default: None
        activation (nn.Cell): Activation function which will be stacked on top of the
        normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.
        keep_prob (float): The keep rate, greater than 0 and less equal than 1. Default: 1.0.

    Returns:
        Tensor, output tensor.

    Examples:
        >>> ops = FeedForward(768, 3072)
    """

    def __init__(
        self,
        in_features: int,
        hidden_features: Optional[int] = None,
        out_features: Optional[int] = None,
        activation: nn.Cell = nn.GELU,
        keep_prob: float = 1.0,
    ):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.dense1 = nn.Dense(in_features, hidden_features)
        self.activation = activation()
        self.dense2 = nn.Dense(hidden_features, out_features)
        self.dropout = nn.Dropout(keep_prob)

    def construct(self, x):
        """Feed Forward construct."""
        x = self.dense1(x)
        x = self.activation(x)
        x = self.dropout(x)
        x = self.dense2(x)
        x = self.dropout(x)

        return x
mindocr.models.backbones.mindcv_models.vit.FeedForward.construct(x)

Feed Forward construct.

Source code in mindocr\models\backbones\mindcv_models\vit.py
213
214
215
216
217
218
219
220
221
def construct(self, x):
    """Feed Forward construct."""
    x = self.dense1(x)
    x = self.activation(x)
    x = self.dropout(x)
    x = self.dense2(x)
    x = self.dropout(x)

    return x
mindocr.models.backbones.mindcv_models.vit.MultilayerDenseHead

Bases: nn.Cell

MultilayerDenseHead architecture.

PARAMETER DESCRIPTION
input_channel

The number of input channel.

TYPE: int

num_classes

Number of classes.

TYPE: int

mid_channel

Number of channels in the hidden fc layers.

TYPE: list

keep_prob

Dropout keeping rate, between [0, 1]. E.g. rate=0.9, means dropping out 10% of

TYPE: list

activation

activate function applied to the output. Eg. ReLU.

TYPE: list

RETURNS DESCRIPTION

Tensor, output tensor.

Source code in mindocr\models\backbones\mindcv_models\vit.py
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
class MultilayerDenseHead(nn.Cell):
    """
    MultilayerDenseHead architecture.

    Args:
        input_channel (int): The number of input channel.
        num_classes (int): Number of classes.
        mid_channel (list): Number of channels in the hidden fc layers.
        keep_prob (list): Dropout keeping rate, between [0, 1]. E.g. rate=0.9, means dropping out 10% of
        input.
        activation (list): activate function applied to the output. Eg. `ReLU`.

    Returns:
        Tensor, output tensor.
    """

    def __init__(
        self,
        input_channel: int,
        num_classes: int,
        mid_channel: List[int],
        keep_prob: List[float],
        activation: List[Optional[Union[str, nn.Cell]]],
    ) -> None:
        super().__init__()
        mid_channel.append(num_classes)
        assert len(mid_channel) == len(activation) == len(keep_prob), "The length of the list should be the same."

        length = len(activation)
        head = []

        for i in range(length):
            linear = DenseHead(
                input_channel,
                mid_channel[i],
                activation=activation[i],
                keep_prob=keep_prob[i],
            )
            head.append(linear)
            input_channel = mid_channel[i]

        self.classifier = nn.SequentialCell(head)

    def construct(self, x):
        x = self.classifier(x)

        return x
mindocr.models.backbones.mindcv_models.vit.PatchEmbedding

Bases: nn.Cell

Path embedding layer for ViT. First rearrange b c (h p) (w p) -> b (h w) (p p c).

PARAMETER DESCRIPTION
image_size

Input image size. Default: 224.

TYPE: int DEFAULT: 224

patch_size

Patch size of image. Default: 16.

TYPE: int DEFAULT: 16

embed_dim

The dimension of embedding. Default: 768.

TYPE: int DEFAULT: 768

input_channels

The number of input channel. Default: 3.

TYPE: int DEFAULT: 3

RETURNS DESCRIPTION

Tensor, output tensor.

Examples:

>>> ops = PathEmbedding(224, 16, 768, 3)
Source code in mindocr\models\backbones\mindcv_models\vit.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
class PatchEmbedding(nn.Cell):
    """
    Path embedding layer for ViT. First rearrange b c (h p) (w p) -> b (h w) (p p c).

    Args:
        image_size (int): Input image size. Default: 224.
        patch_size (int): Patch size of image. Default: 16.
        embed_dim (int): The dimension of embedding. Default: 768.
        input_channels (int): The number of input channel. Default: 3.

    Returns:
        Tensor, output tensor.

    Examples:
        >>> ops = PathEmbedding(224, 16, 768, 3)
    """

    MIN_NUM_PATCHES = 4

    def __init__(
        self,
        image_size: int = 224,
        patch_size: int = 16,
        embed_dim: int = 768,
        input_channels: int = 3,
    ):
        super().__init__()
        self.image_size = image_size
        self.patch_size = patch_size
        self.num_patches = (image_size // patch_size) ** 2
        self.conv = nn.Conv2d(input_channels, embed_dim, kernel_size=patch_size, stride=patch_size, has_bias=True)
        self.reshape = ops.Reshape()
        self.transpose = ops.Transpose()

    def construct(self, x):
        """Path Embedding construct."""
        x = self.conv(x)
        b, c, h, w = x.shape
        x = self.reshape(x, (b, c, h * w))
        x = self.transpose(x, (0, 2, 1))

        return x
mindocr.models.backbones.mindcv_models.vit.PatchEmbedding.construct(x)

Path Embedding construct.

Source code in mindocr\models\backbones\mindcv_models\vit.py
104
105
106
107
108
109
110
111
def construct(self, x):
    """Path Embedding construct."""
    x = self.conv(x)
    b, c, h, w = x.shape
    x = self.reshape(x, (b, c, h * w))
    x = self.transpose(x, (0, 2, 1))

    return x
mindocr.models.backbones.mindcv_models.vit.ResidualCell

Bases: nn.Cell

Cell which implements Residual function:

\[output = x + f(x)\]
PARAMETER DESCRIPTION
cell

Cell needed to add residual block.

TYPE: Cell

RETURNS DESCRIPTION

Tensor, output tensor.

Examples:

>>> ops = ResidualCell(nn.Dense(3,4))
Source code in mindocr\models\backbones\mindcv_models\vit.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
class ResidualCell(nn.Cell):
    """
    Cell which implements Residual function:

    $$output = x + f(x)$$

    Args:
        cell (Cell): Cell needed to add residual block.

    Returns:
        Tensor, output tensor.

    Examples:
        >>> ops = ResidualCell(nn.Dense(3,4))
    """

    def __init__(self, cell):
        super().__init__()
        self.cell = cell

    def construct(self, x):
        """ResidualCell construct."""
        return self.cell(x) + x
mindocr.models.backbones.mindcv_models.vit.ResidualCell.construct(x)

ResidualCell construct.

Source code in mindocr\models\backbones\mindcv_models\vit.py
244
245
246
def construct(self, x):
    """ResidualCell construct."""
    return self.cell(x) + x
mindocr.models.backbones.mindcv_models.vit.TransformerEncoder

Bases: nn.Cell

TransformerEncoder implementation.

PARAMETER DESCRIPTION
dim

The dimension of embedding.

TYPE: int

num_layers

The depth of transformer.

TYPE: int

num_heads

The number of attention heads.

TYPE: int

mlp_dim

The dimension of MLP hidden layer.

TYPE: int

keep_prob

The keep rate, greater than 0 and less equal than 1. Default: 1.0.

TYPE: float DEFAULT: 1.0

attention_keep_prob

The keep rate for attention. Default: 1.0.

TYPE: float DEFAULT: 1.0

drop_path_keep_prob

The keep rate for drop path. Default: 1.0.

TYPE: float DEFAULT: 1.0

activation

Activation function which will be stacked on top of the

TYPE: nn.Cell DEFAULT: nn.GELU

normalization

nn.GELU.

TYPE: layer (if not None), otherwise on top of the conv layer. Default

norm

Norm layer that will be stacked on top of the convolution

TYPE: nn.Cell DEFAULT: nn.LayerNorm

layer.

nn.LayerNorm.

TYPE: Default

RETURNS DESCRIPTION

Tensor, output tensor.

Examples:

>>> ops = TransformerEncoder(768, 12, 12, 3072)
Source code in mindocr\models\backbones\mindcv_models\vit.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
class TransformerEncoder(nn.Cell):
    """
    TransformerEncoder implementation.

    Args:
        dim (int): The dimension of embedding.
        num_layers (int): The depth of transformer.
        num_heads (int): The number of attention heads.
        mlp_dim (int): The dimension of MLP hidden layer.
        keep_prob (float): The keep rate, greater than 0 and less equal than 1. Default: 1.0.
        attention_keep_prob (float): The keep rate for attention. Default: 1.0.
        drop_path_keep_prob (float): The keep rate for drop path. Default: 1.0.
        activation (nn.Cell): Activation function which will be stacked on top of the
        normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.
        norm (nn.Cell, optional): Norm layer that will be stacked on top of the convolution
        layer. Default: nn.LayerNorm.

    Returns:
        Tensor, output tensor.

    Examples:
        >>> ops = TransformerEncoder(768, 12, 12, 3072)
    """

    def __init__(
        self,
        dim: int,
        num_layers: int,
        num_heads: int,
        mlp_dim: int,
        keep_prob: float = 1.0,
        attention_keep_prob: float = 1.0,
        drop_path_keep_prob: float = 1.0,
        activation: nn.Cell = nn.GELU,
        norm: nn.Cell = nn.LayerNorm,
    ):
        super().__init__()
        drop_path_rate = 1 - drop_path_keep_prob
        dpr = [i.item() for i in np.linspace(0, drop_path_rate, num_layers)]
        attn_seeds = [np.random.randint(1024) for _ in range(num_layers)]
        mlp_seeds = [np.random.randint(1024) for _ in range(num_layers)]

        layers = []
        for i in range(num_layers):
            normalization1 = norm((dim,))
            normalization2 = norm((dim,))
            attention = Attention(dim=dim,
                                  num_heads=num_heads,
                                  keep_prob=keep_prob,
                                  attention_keep_prob=attention_keep_prob)

            feedforward = FeedForward(in_features=dim,
                                      hidden_features=mlp_dim,
                                      activation=activation,
                                      keep_prob=keep_prob)

            if drop_path_rate > 0:
                layers.append(
                    nn.SequentialCell([
                        ResidualCell(nn.SequentialCell([normalization1,
                                                        attention,
                                                        DropPath(dpr[i], attn_seeds[i])])),
                        ResidualCell(nn.SequentialCell([normalization2,
                                                        feedforward,
                                                        DropPath(dpr[i], mlp_seeds[i])]))]))
            else:
                layers.append(
                    nn.SequentialCell([
                        ResidualCell(nn.SequentialCell([normalization1,
                                                        attention])),
                        ResidualCell(nn.SequentialCell([normalization2,
                                                        feedforward]))
                    ])
                )
        self.layers = nn.SequentialCell(layers)

    def construct(self, x):
        """Transformer construct."""
        return self.layers(x)
mindocr.models.backbones.mindcv_models.vit.TransformerEncoder.construct(x)

Transformer construct.

Source code in mindocr\models\backbones\mindcv_models\vit.py
350
351
352
def construct(self, x):
    """Transformer construct."""
    return self.layers(x)
mindocr.models.backbones.mindcv_models.vit.ViT

Bases: nn.Cell

Vision Transformer architecture implementation.

PARAMETER DESCRIPTION
image_size

Input image size. Default: 224.

TYPE: int DEFAULT: 224

input_channels

The number of input channel. Default: 3.

TYPE: int DEFAULT: 3

patch_size

Patch size of image. Default: 16.

TYPE: int DEFAULT: 16

embed_dim

The dimension of embedding. Default: 768.

TYPE: int DEFAULT: 768

num_layers

The depth of transformer. Default: 12.

TYPE: int DEFAULT: 12

num_heads

The number of attention heads. Default: 12.

TYPE: int DEFAULT: 12

mlp_dim

The dimension of MLP hidden layer. Default: 3072.

TYPE: int DEFAULT: 3072

keep_prob

The keep rate, greater than 0 and less equal than 1. Default: 1.0.

TYPE: float DEFAULT: 1.0

attention_keep_prob

The keep rate for attention layer. Default: 1.0.

TYPE: float DEFAULT: 1.0

drop_path_keep_prob

The keep rate for drop path. Default: 1.0.

TYPE: float DEFAULT: 1.0

activation

Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.

TYPE: nn.Cell DEFAULT: nn.GELU

norm

Norm layer that will be stacked on top of the convolution layer. Default: nn.LayerNorm.

TYPE: nn.Cell DEFAULT: nn.LayerNorm

pool

The method of pooling. Default: 'cls'.

TYPE: str DEFAULT: 'cls'

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).
Outputs

Tensor of shape :math:(N, 768)

RAISES DESCRIPTION
ValueError

If split is not 'train', 'test' or 'infer'.

Supported Platforms

GPU

Examples:

>>> net = ViT()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 768)

About ViT:

Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Citation:

.. code-block::

@article{2020An,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
year={2020},
}
Source code in mindocr\models\backbones\mindcv_models\vit.py
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
class ViT(nn.Cell):
    """
    Vision Transformer architecture implementation.

    Args:
        image_size (int): Input image size. Default: 224.
        input_channels (int): The number of input channel. Default: 3.
        patch_size (int): Patch size of image. Default: 16.
        embed_dim (int): The dimension of embedding. Default: 768.
        num_layers (int): The depth of transformer. Default: 12.
        num_heads (int): The number of attention heads. Default: 12.
        mlp_dim (int): The dimension of MLP hidden layer. Default: 3072.
        keep_prob (float): The keep rate, greater than 0 and less equal than 1. Default: 1.0.
        attention_keep_prob (float): The keep rate for attention layer. Default: 1.0.
        drop_path_keep_prob (float): The keep rate for drop path. Default: 1.0.
        activation (nn.Cell): Activation function which will be stacked on top of the
            normalization layer (if not None), otherwise on top of the conv layer. Default: nn.GELU.
        norm (nn.Cell, optional): Norm layer that will be stacked on top of the convolution
            layer. Default: nn.LayerNorm.
        pool (str): The method of pooling. Default: 'cls'.

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Outputs:
        Tensor of shape :math:`(N, 768)`

    Raises:
        ValueError: If `split` is not 'train', 'test' or 'infer'.

    Supported Platforms:
        ``GPU``

    Examples:
        >>> net = ViT()
        >>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
        >>> output = net(x)
        >>> print(output.shape)
        (1, 768)

    About ViT:

    Vision Transformer (ViT) shows that a pure transformer applied directly to sequences of image
    patches can perform very well on image classification tasks. When pre-trained on large amounts
    of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet,
    CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art
    convolutional networks while requiring substantially fewer computational resources to train.

    Citation:

    .. code-block::

        @article{2020An,
        title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
        author={Dosovitskiy, A. and Beyer, L. and Kolesnikov, A. and Weissenborn, D. and Houlsby, N.},
        year={2020},
        }
    """

    def __init__(
        self,
        image_size: int = 224,
        input_channels: int = 3,
        patch_size: int = 16,
        embed_dim: int = 768,
        num_layers: int = 12,
        num_heads: int = 12,
        mlp_dim: int = 3072,
        keep_prob: float = 1.0,
        attention_keep_prob: float = 1.0,
        drop_path_keep_prob: float = 1.0,
        activation: nn.Cell = nn.GELU,
        norm: Optional[nn.Cell] = nn.LayerNorm,
        pool: str = "cls",
    ) -> None:
        super().__init__()

        # Validator.check_string(pool, ["cls", "mean"], "pool type")

        self.patch_embedding = PatchEmbedding(image_size=image_size,
                                              patch_size=patch_size,
                                              embed_dim=embed_dim,
                                              input_channels=input_channels)
        num_patches = self.patch_embedding.num_patches

        if pool == "cls":
            self.cls_token = init(init_type=Normal(sigma=1.0),
                                  shape=(1, 1, embed_dim),
                                  dtype=ms.float32,
                                  name="cls",
                                  requires_grad=True)
            self.pos_embedding = init(init_type=Normal(sigma=1.0),
                                      shape=(1, num_patches + 1, embed_dim),
                                      dtype=ms.float32,
                                      name="pos_embedding",
                                      requires_grad=True)
            self.concat = ops.Concat(axis=1)
        else:
            self.pos_embedding = init(init_type=Normal(sigma=1.0),
                                      shape=(1, num_patches, embed_dim),
                                      dtype=ms.float32,
                                      name="pos_embedding",
                                      requires_grad=True)
            self.mean = ops.ReduceMean(keep_dims=False)

        self.pool = pool
        self.pos_dropout = nn.Dropout(keep_prob)
        self.norm = norm((embed_dim,))
        self.tile = ops.Tile()
        self.transformer = TransformerEncoder(
            dim=embed_dim,
            num_layers=num_layers,
            num_heads=num_heads,
            mlp_dim=mlp_dim,
            keep_prob=keep_prob,
            attention_keep_prob=attention_keep_prob,
            drop_path_keep_prob=drop_path_keep_prob,
            activation=activation,
            norm=norm,
        )

    def construct(self, x):
        """ViT construct."""
        x = self.patch_embedding(x)

        if self.pool == "cls":
            cls_tokens = self.tile(self.cls_token, (x.shape[0], 1, 1))
            x = self.concat((cls_tokens, x))
            x += self.pos_embedding
        else:
            x += self.pos_embedding
        x = self.pos_dropout(x)
        x = self.transformer(x)
        x = self.norm(x)

        if self.pool == "cls":
            x = x[:, 0]
        else:
            x = self.mean(x, (1, 2))  # (1,) or (1,2)
        return x
mindocr.models.backbones.mindcv_models.vit.ViT.construct(x)

ViT construct.

Source code in mindocr\models\backbones\mindcv_models\vit.py
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
def construct(self, x):
    """ViT construct."""
    x = self.patch_embedding(x)

    if self.pool == "cls":
        cls_tokens = self.tile(self.cls_token, (x.shape[0], 1, 1))
        x = self.concat((cls_tokens, x))
        x += self.pos_embedding
    else:
        x += self.pos_embedding
    x = self.pos_dropout(x)
    x = self.transformer(x)
    x = self.norm(x)

    if self.pool == "cls":
        x = x[:, 0]
    else:
        x = self.mean(x, (1, 2))  # (1,) or (1,2)
    return x
mindocr.models.backbones.mindcv_models.vit.vit(image_size, input_channels, patch_size, embed_dim, num_layers, num_heads, num_classes, mlp_dim, dropout=0.0, attention_dropout=0.0, drop_path_rate=0.0, activation=nn.GELU, norm=nn.LayerNorm, pool='cls', representation_size=None, pretrained=False, url_cfg=None)

Vision Transformer architecture.

Source code in mindocr\models\backbones\mindcv_models\vit.py
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
def vit(
    image_size: int,
    input_channels: int,
    patch_size: int,
    embed_dim: int,
    num_layers: int,
    num_heads: int,
    num_classes: int,
    mlp_dim: int,
    dropout: float = 0.0,
    attention_dropout: float = 0.0,
    drop_path_rate: float = 0.0,
    activation: nn.Cell = nn.GELU,
    norm: nn.Cell = nn.LayerNorm,
    pool: str = "cls",
    representation_size: Optional[int] = None,
    pretrained: bool = False,
    url_cfg: dict = None,
) -> ViT:
    """Vision Transformer architecture."""
    backbone = ViT(
        image_size=image_size,
        input_channels=input_channels,
        patch_size=patch_size,
        embed_dim=embed_dim,
        num_layers=num_layers,
        num_heads=num_heads,
        mlp_dim=mlp_dim,
        keep_prob=1.0 - dropout,
        attention_keep_prob=1.0 - attention_dropout,
        drop_path_keep_prob=1.0 - drop_path_rate,
        activation=activation,
        norm=norm,
        pool=pool,
    )
    if representation_size:
        head = MultilayerDenseHead(
            input_channel=embed_dim,
            num_classes=num_classes,
            mid_channel=[representation_size],
            activation=["tanh", None],
            keep_prob=[1.0, 1.0],
        )
    else:
        head = DenseHead(input_channel=embed_dim, num_classes=num_classes)

    model = BaseClassifier(backbone=backbone, head=head)

    if pretrained:
        # Download the pre-trained checkpoint file from url, and load ckpt file.
        load_pretrained(model, url_cfg, num_classes=num_classes, in_channels=input_channels)

    return model
mindocr.models.backbones.mindcv_models.vit.vit_b_16_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

Constructs a vit_b_16 architecture from An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>_.

PARAMETER DESCRIPTION
pretrained

Whether to download and load the pre-trained model. Default: False.

TYPE: bool DEFAULT: False

num_classes

The number of classification. Default: 1000.

TYPE: int DEFAULT: 1000

in_channels

The number of input channels. Default: 3.

TYPE: int DEFAULT: 3

image_size

The input image size. Default: 224 for ImageNet.

TYPE: int DEFAULT: 224

has_logits

Whether has logits or not. Default: False.

TYPE: bool DEFAULT: False

drop_rate

The drop out rate. Default: 0.0.s

TYPE: float DEFAULT: 0.0

drop_path_rate

The stochastic depth rate. Default: 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
ViT

ViT network, MindSpore.nn.Cell

Inputs
  • x (Tensor) - Tensor of shape :math:(N, C_{in}, H_{in}, W_{in}).

Examples:

>>> net = vit_b_16_224()
>>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
>>> output = net(x)
>>> print(output.shape)
(1, 1000)
Outputs

Tensor of shape :math:(N, CLASSES_{out})

Supported Platforms

GPU

Source code in mindocr\models\backbones\mindcv_models\vit.py
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
@register_model
def vit_b_16_224(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 224,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention-dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """
    Constructs a vit_b_16 architecture from
    `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_.

    Args:
        pretrained (bool): Whether to download and load the pre-trained model. Default: False.
        num_classes (int): The number of classification. Default: 1000.
        in_channels (int): The number of input channels. Default: 3.
        image_size (int): The input image size. Default: 224 for ImageNet.
        has_logits (bool): Whether has logits or not. Default: False.
        drop_rate (float): The drop out rate. Default: 0.0.s
        drop_path_rate (float): The stochastic depth rate. Default: 0.0.

    Returns:
        ViT network, MindSpore.nn.Cell

    Inputs:
        - **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, H_{in}, W_{in})`.

    Examples:
        >>> net = vit_b_16_224()
        >>> x = ms.Tensor(np.ones([1, 3, 224, 224]), ms.float32)
        >>> output = net(x)
        >>> print(output.shape)
        (1, 1000)

    Outputs:
        Tensor of shape :math:`(N, CLASSES_{out})`

    Supported Platforms:
        ``GPU``
    """
    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 16
    config.embed_dim = 768
    config.mlp_dim = 3072
    config.num_heads = 12
    config.num_layers = 12
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention-dropout
    config.drop_path_rate = drop_path_rate
    config.pretrained = pretrained
    config.input_channels = in_channels
    config.pool = "cls"
    config.representation_size = 768 if has_logits else None

    config.url_cfg = default_cfgs["vit_b_16_224"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.vit.vit_b_16_384(pretrained=False, num_classes=1000, in_channels=3, image_size=384, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

construct and return a ViT network

Source code in mindocr\models\backbones\mindcv_models\vit.py
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
@register_model
def vit_b_16_384(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 384,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention-dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """construct and return a ViT network"""
    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 16
    config.embed_dim = 768
    config.mlp_dim = 3072
    config.num_heads = 12
    config.num_layers = 12
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention-dropout
    config.drop_path_rate = drop_path_rate
    config.pretrained = pretrained
    config.input_channels = in_channels
    config.pool = "cls"
    config.representation_size = 768 if has_logits else None

    config.url_cfg = default_cfgs["vit_b_16_384"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.vit.vit_b_32_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

construct and return a ViT network

Source code in mindocr\models\backbones\mindcv_models\vit.py
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
@register_model
def vit_b_32_224(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 224,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention-dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """construct and return a ViT network"""
    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 32
    config.embed_dim = 768
    config.mlp_dim = 3072
    config.num_heads = 12
    config.num_layers = 12
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention-dropout
    config.drop_path_rate = drop_path_rate
    config.pretrained = pretrained
    config.input_channels = in_channels
    config.pool = "cls"
    config.representation_size = 768 if has_logits else None

    config.url_cfg = default_cfgs["vit_b_32_224"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.vit.vit_b_32_384(pretrained=False, num_classes=1000, in_channels=3, image_size=384, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

construct and return a ViT network

Source code in mindocr\models\backbones\mindcv_models\vit.py
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
@register_model
def vit_b_32_384(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 384,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention_dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """construct and return a ViT network"""
    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 32
    config.embed_dim = 768
    config.mlp_dim = 3072
    config.num_heads = 12
    config.num_layers = 12
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention_dropout
    config.drop_path_rate = drop_path_rate
    config.pretrained = pretrained
    config.input_channels = in_channels
    config.pool = "cls"
    config.representation_size = 768 if has_logits else None

    config.url_cfg = default_cfgs["vit_b_32_384"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.vit.vit_l_16_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

construct and return a ViT network

Source code in mindocr\models\backbones\mindcv_models\vit.py
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
@register_model
def vit_l_16_224(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 224,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention-dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """construct and return a ViT network"""

    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 16
    config.embed_dim = 1024
    config.mlp_dim = 4096
    config.num_heads = 16
    config.num_layers = 24
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention-dropout
    config.drop_path_rate = drop_path_rate
    config.input_channels = in_channels
    config.pool = "cls"
    config.pretrained = pretrained
    config.representation_size = 1024 if has_logits else None

    config.url_cfg = default_cfgs["vit_l_16_224"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.vit.vit_l_16_384(pretrained=False, num_classes=1000, in_channels=3, image_size=384, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

construct and return a ViT network

Source code in mindocr\models\backbones\mindcv_models\vit.py
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
@register_model
def vit_l_16_384(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 384,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention-dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """construct and return a ViT network"""

    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 16
    config.embed_dim = 1024
    config.mlp_dim = 4096
    config.num_heads = 16
    config.num_layers = 24
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention-dropout
    config.drop_path_rate = drop_path_rate
    config.input_channels = in_channels
    config.pool = "cls"
    config.pretrained = pretrained
    config.representation_size = 1024 if has_logits else None

    config.url_cfg = default_cfgs["vit_l_16_384"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.vit.vit_l_32_224(pretrained=False, num_classes=1000, in_channels=3, image_size=224, has_logits=False, drop_rate=0.0, drop_path_rate=0.0)

construct and return a ViT network

Source code in mindocr\models\backbones\mindcv_models\vit.py
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
@register_model
def vit_l_32_224(
    pretrained: bool = False,
    num_classes: int = 1000,
    in_channels: int = 3,
    image_size: int = 224,
    has_logits: bool = False,
    drop_rate: float = 0.0,
    # attention-dropout: float = 0.0,
    drop_path_rate: float = 0.0,
) -> ViT:
    """construct and return a ViT network"""
    config = ConfigDict()
    config.image_size = image_size
    config.num_classes = num_classes
    config.patch_size = 32
    config.embed_dim = 1024
    config.mlp_dim = 4096
    config.num_heads = 16
    config.num_layers = 24
    config.dropout = drop_rate
    config.attention_dropout = drop_rate  # attention-dropout
    config.drop_path_rate = drop_path_rate
    config.pretrained = pretrained
    config.input_channels = in_channels
    config.pool = "cls"
    config.representation_size = 1024 if has_logits else None

    config.url_cfg = default_cfgs["vit_l_32_224"]

    return vit(**config)
mindocr.models.backbones.mindcv_models.xcit

MindSpore implementation of XCiT Refer to: XCiT: Cross-Covariance Image Transformers

mindocr.models.backbones.mindcv_models.xcit.ClassAttention

Bases: nn.Cell

Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239

Source code in mindocr\models\backbones\mindcv_models\xcit.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
class ClassAttention(nn.Cell):
    """Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
    """

    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = qk_scale or head_dim ** -0.5

        self.qkv = nn.Dense(
            in_channels=dim, out_channels=dim * 3, has_bias=qkv_bias)
        self.attn_drop = nn.Dropout(keep_prob=1 - attn_drop)
        self.proj = nn.Dense(in_channels=dim, out_channels=dim)
        self.proj_drop = nn.Dropout(keep_prob=1 - proj_drop)
        self.softmax = nn.Softmax(axis=-1)

        self.attn_matmul_v = ops.BatchMatMul()

    def construct(self, x: Tensor) -> Tensor:
        B, N, C = x.shape

        qkv = self.qkv(x)
        qkv = ops.reshape(qkv, (B, N, 3, self.num_heads, C // self.num_heads))
        qkv = ops.transpose(qkv, (2, 0, 3, 1, 4))
        q, k, v = ops.unstack(qkv, axis=0)
        qc = q[:, :, 0:1]
        attn_cls = (qc * k).sum(-1) * self.scale
        attn_cls = self.softmax(attn_cls)
        attn_cls = self.attn_drop(attn_cls)

        attn_cls = ops.expand_dims(attn_cls, 2)
        cls_tkn = self.attn_matmul_v(attn_cls, v)
        cls_tkn = ops.transpose(cls_tkn, (0, 2, 1, 3))
        cls_tkn = ops.reshape(cls_tkn, (B, 1, C))
        cls_tkn = self.proj(cls_tkn)
        x = ops.concat((self.proj_drop(cls_tkn), x[:, 1:]), axis=1)
        return x
mindocr.models.backbones.mindcv_models.xcit.ClassAttentionBlock

Bases: nn.Cell

Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239

Source code in mindocr\models\backbones\mindcv_models\xcit.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
class ClassAttentionBlock(nn.Cell):
    """Class Attention Layer as in CaiT https://arxiv.org/abs/2103.17239
    """

    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0.,
                 attn_drop=0., drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, eta=None,
                 tokens_norm=False):
        super().__init__()
        self.norm1 = norm_layer([dim])
        self.attn = ClassAttention(
            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop,
            proj_drop=drop
        )

        self.drop_path = DropPath(
            drop_path) if drop_path > 0. else ops.Identity()
        self.norm2 = norm_layer([dim])
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer,
                       drop=drop)

        # LayerScale Initialization (no layerscale when None)
        if eta is not None:
            self.gamma1 = Parameter(
                eta * ops.Ones()((dim), mstype.float32), requires_grad=True)
            self.gamma2 = Parameter(
                eta * ops.Ones()((dim), mstype.float32), requires_grad=True)
        else:
            self.gamma1, self.gamma2 = 1.0, 1.0

        # FIXME: A hack for models pre-trained with layernorm over all the tokens not just the CLS
        self.tokens_norm = tokens_norm

    def construct(self, x, H, W, mask=None):

        x = x + self.drop_path(self.gamma1 * self.attn(self.norm1(x)))

        if self.tokens_norm:
            x = self.norm2(x)
        else:
            x[:, 0:1] = self.norm2(x[:, 0:1])
        x_res = x
        cls_token = x[:, 0:1]
        cls_token = self.gamma2 * self.mlp(cls_token)
        x = ops.concat((cls_token, x[:, 1:]), axis=1)
        x = x_res + self.drop_path(x)
        return x
mindocr.models.backbones.mindcv_models.xcit.ConvPatchEmbed

Bases: nn.Cell

Image to Patch Embedding using multiple convolutional layers

Source code in mindocr\models\backbones\mindcv_models\xcit.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
class ConvPatchEmbed(nn.Cell):
    """ Image to Patch Embedding using multiple convolutional layers
    """

    def __init__(self,
                 img_size: int = 224,
                 patch_size: int = 16,
                 in_chans: int = 3,
                 embed_dim: int = 768
                 ) -> None:
        super().__init__()
        img_size = to_2tuple(img_size)
        patch_size = to_2tuple(patch_size)
        num_patches = (img_size[1] // patch_size[1]) * \
            (img_size[0] // patch_size[0])
        self.img_size = img_size
        self.patch_size = patch_size
        self.num_patches = num_patches

        if patch_size[0] == 16:
            self.proj = nn.SequentialCell([
                conv3x3(3, embed_dim // 8, 2),
                nn.GELU(),
                conv3x3(embed_dim // 8, embed_dim // 4, 2),
                nn.GELU(),
                conv3x3(embed_dim // 4, embed_dim // 2, 2),
                nn.GELU(),
                conv3x3(embed_dim // 2, embed_dim, 2),
            ])
        elif patch_size[0] == 8:
            self.proj = nn.SequentialCell([
                conv3x3(3, embed_dim // 4, 2),
                nn.GELU(),
                conv3x3(embed_dim // 4, embed_dim // 2, 2),
                nn.GELU(),
                conv3x3(embed_dim // 2, embed_dim, 2),
            ])
        else:
            raise ValueError(
                "For convolutional projection, patch size has to be in [8, 16]")

    def construct(self, x, padding_size=None) -> Tensor:
        x = self.proj(x)
        B, C, Hp, Wp = x.shape
        x = ops.reshape(x, (B, C, Hp * Wp))
        x = x.transpose(0, 2, 1)

        return x, (Hp, Wp)
mindocr.models.backbones.mindcv_models.xcit.LPI

Bases: nn.Cell

Local Patch Interaction module that allows explicit communication between tokens in 3x3 windows to augment the implicit communcation performed by the block diagonal scatter attention. Implemented using 2 layers of separable 3x3 convolutions with GeLU and BatchNorm2d

Source code in mindocr\models\backbones\mindcv_models\xcit.py
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
class LPI(nn.Cell):
    """
    Local Patch Interaction module that allows explicit communication between tokens in 3x3 windows
    to augment the implicit communcation performed by the block diagonal scatter attention.
    Implemented using 2 layers of separable 3x3 convolutions with GeLU and BatchNorm2d
    """

    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU,
                 drop=0., kernel_size=3) -> None:
        super().__init__()
        out_features = out_features or in_features

        padding = kernel_size // 2

        self.conv1 = nn.Conv2d(in_features, out_features, kernel_size=kernel_size,
                               padding=padding, pad_mode='pad', group=out_features, has_bias=True)
        self.act = act_layer()
        self.bn = nn.BatchNorm2d(in_features, use_batch_statistics=True)
        self.conv2 = nn.Conv2d(in_features, out_features, kernel_size=kernel_size,
                               padding=padding, pad_mode='pad', group=out_features, has_bias=True)

    def construct(self, x, H, W) -> Tensor:
        B, N, C = x.shape
        x = ops.reshape(ops.transpose(x, (0, 2, 1)), (B, C, H, W))
        x = self.conv1(x)
        x = self.act(x)
        x = self.bn(x)
        x = self.conv2(x)
        x = ops.transpose(ops.reshape(x, (B, C, N)), (0, 2, 1))

        return x
mindocr.models.backbones.mindcv_models.xcit.PositionalEncodingFourier

Bases: nn.Cell

Positional encoding relying on a fourier kernel matching the one used in the "Attention is all of Need" paper. The implementation builds on DeTR code https://github.com/facebookresearch/detr/blob/master/models/position_encoding.py

Source code in mindocr\models\backbones\mindcv_models\xcit.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
class PositionalEncodingFourier(nn.Cell):
    """
    Positional encoding relying on a fourier kernel matching the one used in the
    "Attention is all of Need" paper. The implementation builds on DeTR code
    https://github.com/facebookresearch/detr/blob/master/models/position_encoding.py
    """

    def __init__(self,
                 hidden_dim: int = 32,
                 dim: int = 768,
                 temperature=10000
                 ) -> None:
        super().__init__()
        self.token_projection = nn.Conv2d(
            hidden_dim * 2, dim, kernel_size=1, has_bias=True)
        self.scale = 2 * np.pi
        self.temperature = temperature
        self.hidden_dim = hidden_dim
        self.dim = dim

    def construct(self, B, H, W) -> Tensor:
        mask = Tensor(np.zeros((B, H, W)).astype(bool))
        not_mask = ~mask
        y_embed = not_mask.cumsum(1, dtype=mstype.float32)
        x_embed = not_mask.cumsum(2, dtype=mstype.float32)
        eps = 1e-6
        y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
        x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale

        dim_t = numpy.arange(self.hidden_dim, dtype=mstype.float32)
        dim_t = self.temperature ** (2 * (dim_t // 2) / self.hidden_dim)

        pos_x = x_embed[:, :, :, None] / dim_t
        pos_y = y_embed[:, :, :, None] / dim_t
        pos_x = ops.stack((ops.sin(pos_x[:, :, :, 0::2]),
                           ops.cos(pos_x[:, :, :, 1::2])), 4)
        x1, x2, x3, x4, x5 = pos_x.shape
        pos_x = ops.reshape(pos_x, (x1, x2, x3, x4 * x5))
        pos_y = ops.stack((ops.sin(pos_y[:, :, :, 0::2]),
                           ops.cos(pos_y[:, :, :, 1::2])), 4)
        y1, y2, y3, y4, y5 = pos_y.shape
        pos_y = ops.reshape(pos_y, (y1, y2, y3, y4 * y5))
        pos = ops.transpose(ops.concat((pos_y, pos_x), 3), (0, 3, 1, 2))
        pos = self.token_projection(pos)
        return pos
mindocr.models.backbones.mindcv_models.xcit.XCA

Bases: nn.Cell

Cross-Covariance Attention (XCA) operation where the channels are updated using a weighted sum. The weights are obtained from the (softmax normalized) Cross-covariance matrix (Q^T K \in d_h \times d_h)

Source code in mindocr\models\backbones\mindcv_models\xcit.py
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
class XCA(nn.Cell):

    """ Cross-Covariance Attention (XCA) operation where the channels are updated using a weighted
     sum. The weights are obtained from the (softmax normalized) Cross-covariance
    matrix (Q^T K \\in d_h \\times d_h)
    """

    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):
        super().__init__()
        self.num_heads = num_heads
        self.temperature = Parameter(
            ops.Ones()((num_heads, 1, 1), mstype.float32))
        self.qkv = nn.Dense(
            in_channels=dim, out_channels=dim * 3, has_bias=qkv_bias)
        self.q_matmul_k = ops.BatchMatMul(transpose_b=True)
        self.softmax = nn.Softmax(axis=-1)
        self.attn_drop = nn.Dropout(keep_prob=1.0 - attn_drop)
        self.attn_matmul_v = ops.BatchMatMul()
        self.proj = nn.Dense(in_channels=dim, out_channels=dim)
        self.proj_drop = nn.Dropout(keep_prob=1.0 - proj_drop)

    def construct(self, x):
        B, N, C = x.shape

        qkv = ops.reshape(
            self.qkv(x), (B, N, 3, self.num_heads, C // self.num_heads))
        qkv = ops.transpose(qkv, (2, 0, 3, 1, 4))
        q, k, v = ops.unstack(qkv, axis=0)

        q = ops.transpose(q, (0, 1, 3, 2))
        k = ops.transpose(k, (0, 1, 3, 2))
        v = ops.transpose(v, (0, 1, 3, 2))

        attn = self.q_matmul_k(q, k) * self.temperature
        attn = self.softmax(attn)
        attn = self.attn_drop(attn)
        x = self.attn_matmul_v(attn, v)
        x = ops.transpose(x, (0, 3, 1, 2))
        x = ops.reshape(x, (B, N, C))
        x = self.proj(x)
        x = self.proj_drop(x)
        return x
mindocr.models.backbones.mindcv_models.xcit.XCiT

Bases: nn.Cell

XCiT model class, based on "XCiT: Cross-Covariance Image Transformers" <https://arxiv.org/abs/2106.09681>_

PARAMETER DESCRIPTION
img_size

input image size

TYPE: int, tuple DEFAULT: 224

patch_size

patch size

TYPE: int, tuple DEFAULT: 16

in_chans

number of input channels

TYPE: int DEFAULT: 3

num_classes

number of classes for classification head

TYPE: int DEFAULT: 1000

embed_dim

embedding dimension

TYPE: int DEFAULT: 768

depth

depth of transformer

TYPE: int DEFAULT: 12

num_heads

number of attention heads

TYPE: int DEFAULT: 12

mlp_ratio

ratio of mlp hidden dim to embedding dim

TYPE: int DEFAULT: 4.0

qkv_bias

enable bias for qkv if True

TYPE: bool DEFAULT: True

qk_scale

override default qk scale of head_dim ** -0.5 if set

TYPE: float DEFAULT: None

drop_rate

dropout rate

TYPE: float DEFAULT: 0.0

attn_drop_rate

attention dropout rate

TYPE: float DEFAULT: 0.0

drop_path_rate

stochastic depth rate

TYPE: float DEFAULT: 0.0

norm_layer

(nn.Module): normalization layer

TYPE: nn.Cell DEFAULT: None

cls_attn_layers

(int) Depth of Class attention layers

TYPE: int DEFAULT: 2

use_pos

(bool) whether to use positional encoding

TYPE: bool DEFAULT: True

eta

(float) layerscale initialization value

TYPE: float DEFAULT: None

tokens_norm

(bool) Whether to normalize all tokens or just the cls_token in the CA

TYPE: bool DEFAULT: False

Source code in mindocr\models\backbones\mindcv_models\xcit.py
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
class XCiT(nn.Cell):
    r"""XCiT model class, based on
    `"XCiT: Cross-Covariance Image Transformers" <https://arxiv.org/abs/2106.09681>`_
    Args:
        img_size (int, tuple): input image size
        patch_size (int, tuple): patch size
        in_chans (int): number of input channels
        num_classes (int): number of classes for classification head
        embed_dim (int): embedding dimension
        depth (int): depth of transformer
        num_heads (int): number of attention heads
        mlp_ratio (int): ratio of mlp hidden dim to embedding dim
        qkv_bias (bool): enable bias for qkv if True
        qk_scale (float): override default qk scale of head_dim ** -0.5 if set
        drop_rate (float): dropout rate
        attn_drop_rate (float): attention dropout rate
        drop_path_rate (float): stochastic depth rate
        norm_layer: (nn.Module): normalization layer
        cls_attn_layers: (int) Depth of Class attention layers
        use_pos: (bool) whether to use positional encoding
        eta: (float) layerscale initialization value
        tokens_norm: (bool) Whether to normalize all tokens or just the cls_token in the CA
    """

    def __init__(self,
                 img_size: int = 224,
                 patch_size: int = 16,
                 in_chans: int = 3,
                 num_classes: int = 1000,
                 embed_dim: int = 768,
                 depth: int = 12,
                 num_heads: int = 12,
                 mlp_ratio: int = 4.,
                 qkv_bias: bool = True,
                 qk_scale: float = None,
                 drop_rate: float = 0.,
                 attn_drop_rate: float = 0.,
                 drop_path_rate: float = 0.,
                 norm_layer: nn.Cell = None,
                 cls_attn_layers: int = 2,
                 use_pos: bool = True,
                 patch_proj: str = 'linear',
                 eta: float = None,
                 tokens_norm: bool = False):
        super().__init__()

        self.num_classes = num_classes
        self.num_features = self.embed_dim = embed_dim
        norm_layer = norm_layer or partial(nn.LayerNorm, epsilon=1e-6)

        self.patch_embed = ConvPatchEmbed(img_size=img_size, embed_dim=embed_dim,
                                          patch_size=patch_size)

        num_patches = self.patch_embed.num_patches

        self.cls_token = Parameter(
            ops.zeros((1, 1, embed_dim), mstype.float32))
        self.pos_drop = nn.Dropout(keep_prob=1.0 - drop_rate)

        dpr = [drop_path_rate for i in range(depth)]
        self.blocks = nn.CellList([
            XCABlock(
                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias,
                qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i],
                norm_layer=norm_layer, num_tokens=num_patches, eta=eta)
            for i in range(depth)])

        self.cls_attn_blocks = nn.CellList([
            ClassAttentionBlock(
                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias,
                qk_scale=qk_scale, drop=drop_rate, attn_drop=attn_drop_rate, norm_layer=norm_layer,
                eta=eta, tokens_norm=tokens_norm)
            for i in range(cls_attn_layers)])
        self.norm = norm_layer([embed_dim])
        self.head = nn.Dense(
            in_channels=embed_dim, out_channels=num_classes) if num_classes > 0 else ops.Identity()

        self.pos_embeder = PositionalEncodingFourier(dim=embed_dim)
        self.use_pos = use_pos

        # Classifier head
        self.cls_token.set_data(weight_init.initializer(weight_init.TruncatedNormal(sigma=0.02),
                                                        self.cls_token.shape,
                                                        self.cls_token.dtype))
        self._init_weights()

    def _init_weights(self) -> None:
        for name, m in self.cells_and_names():
            if isinstance(m, nn.Dense):
                m.weight = weight_init.initializer(weight_init.TruncatedNormal(
                    sigma=0.02), m.weight.shape, mindspore.float32)
                if m.bias is not None:
                    m.bias.set_data(weight_init.initializer(
                        weight_init.Constant(0), m.bias.shape))
            elif isinstance(m, nn.LayerNorm):
                m.beta.set_data(weight_init.initializer(
                    weight_init.Constant(0), m.beta.shape))
                m.gamma.set_data(weight_init.initializer(
                    weight_init.Constant(1), m.gamma.shape))

    def forward_features(self, x):
        B, C, H, W = x.shape
        x, (Hp, Wp) = self.patch_embed(x)
        if self.use_pos:
            pos_encoding = self.pos_embeder(B, Hp, Wp).reshape(
                B, -1, x.shape[1]).transpose(0, 2, 1)
            x = x + pos_encoding
        x = self.pos_drop(x)
        for blk in self.blocks:
            x = blk(x, Hp, Wp)
        cls_tokens = ops.broadcast_to(self.cls_token, (B, -1, -1))
        cls_tokens = ops.cast(cls_tokens, x.dtype)
        x = ops.concat((cls_tokens, x), 1)

        for blk in self.cls_attn_blocks:
            x = blk(x, Hp, Wp)
        return self.norm(x)[:, 0]

    def construct(self, x):
        x = self.forward_features(x)
        x = self.head(x)
        return x
mindocr.models.backbones.mindcv_models.xcit.conv3x3(in_planes, out_planes, stride=1)

3x3 convolution with padding

Source code in mindocr\models\backbones\mindcv_models\xcit.py
91
92
93
94
95
96
97
98
def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.SequentialCell([
        nn.Conv2d(
            in_planes, out_planes, kernel_size=3, stride=stride, padding=1, pad_mode='pad', has_bias=False
        ),
        nn.BatchNorm2d(out_planes, use_batch_statistics=True)
    ])
mindocr.models.backbones.mindcv_models.xcit.xcit_tiny_12_p16(pretrained=False, num_classes=1000, in_channels=3, **kwargs)

Get xcit_tiny_12_p16 model. Refer to the base class 'models.XCiT' for more details.

Source code in mindocr\models\backbones\mindcv_models\xcit.py
477
478
479
480
481
482
483
484
485
486
487
488
489
490
@register_model
def xcit_tiny_12_p16(pretrained: bool = False, num_classes: int = 1000, in_channels=3, **kwargs) -> XCiT:
    """Get xcit_tiny_12_p16 model.
    Refer to the base class 'models.XCiT' for more details.
    """
    default_cfg = default_cfgs['xcit_tiny_12_p16']
    model = XCiT(
        patch_size=16, num_classes=num_classes, embed_dim=192, depth=12, num_heads=4, mlp_ratio=4, qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), eta=1.0, tokens_norm=True, **kwargs)
    if pretrained:
        load_pretrained(model, default_cfg,
                        num_classes=num_classes, in_channels=in_channels)

    return model
mindocr.models.backbones.mindcv_wrapper
mindocr.models.backbones.mindcv_wrapper.MindCVBackboneWrapper

Bases: nn.Cell

It reuses the forward_features interface in mindcv models. Please check where the features are extracted.

Note: text recognition models like CRNN expects output feature in shape [bs, c, h, w]. but some models in mindcv like ViT output features in shape [bs, c]. please check and pick accordingly.

PARAMETER DESCRIPTION
pretrained

Whether the model backbone is pretrained. Default; True

TYPE: bool DEFAULT: True

checkpoint_path

The path of checkpoint files. Default: "".

TYPE: str

features_only

Output the features at different strides instead. Default: False

TYPE: bool DEFAULT: False

out_indices

The indicies of the output features when features_only is True. Default: [0, 1, 2, 3, 4]

TYPE: list[int] DEFAULT: [0, 1, 2, 3, 4]

Example

network = MindCVBackboneWrapper('resnet50', pretrained=True)

Source code in mindocr\models\backbones\mindcv_wrapper.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class MindCVBackboneWrapper(nn.Cell):
    '''
    It reuses the forward_features interface in mindcv models. Please check where the features are extracted.

    Note: text recognition models like CRNN expects output feature in shape [bs, c, h, w]. but some models in mindcv
    like ViT output features in shape [bs, c]. please check and pick accordingly.

    Args:
        pretrained (bool): Whether the model backbone is pretrained. Default; True
        checkpoint_path (str): The path of checkpoint files. Default: "".
        features_only (bool): Output the features at different strides instead. Default: False
        out_indices (list[int]): The indicies of the output features when `features_only` is `True`.
             Default: [0, 1, 2, 3, 4]

    Example:
        network = MindCVBackboneWrapper('resnet50', pretrained=True)
    '''

    def __init__(self, name, pretrained=True, ckpt_path=None, features_only: bool = False,
                 out_indices: List[int] = [0, 1, 2, 3, 4], **kwargs):
        super().__init__()
        self.features_only = features_only

        model_name = name.replace('@mindcv', "").replace("mindcv.", "")
        network = mindcv_models.create_model(model_name, pretrained=pretrained, features_only=features_only,
                                             out_indices=out_indices)
        # for local checkpoint
        if ckpt_path is not None:
            checkpoint_param = load_checkpoint(ckpt_path)
            load_param_into_net(network, checkpoint_param)

        if not self.features_only:
            if hasattr(network, 'classifier'):
                del network.classifier  # remove the original header to avoid confusion

            self.network = network
            # probe to get out_channels
            # network.eval()
            # TODO: get image input size from default cfg
            x = ms.Tensor(np.random.rand(2, 3, 224, 224), dtype=ms.float32)
            h = network.forward_features(x)
            h = ops.stop_gradient(h)
            self.out_channels = h.shape[1]

            print(f'INFO: Load MindCV Backbone {model_name}, the output features shape for input 224x224 is {h.shape}. '
                  f'\n\tProbed out_channels : ', self.out_channels)
        else:
            self.network = network
            self.out_channels = self.network.out_channels
            print(f'INFO: Load MindCV Backbone {model_name} with feature index {out_indices}, '
                  f'output channels: {self.out_channels}')

    def construct(self, x):
        if self.features_only:
            features = self.network(x)
            return features
        else:
            features = self.network.forward_features(x)
            return [features]
mindocr.models.backbones.rec_vgg
mindocr.models.backbones.rec_vgg.RecVGG

Bases: nn.Cell

VGG Network structure

Source code in mindocr\models\backbones\rec_vgg.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
@register_backbone_class
class RecVGG(nn.Cell):
    """VGG Network structure"""

    def __init__(self, **kwargs):
        super().__init__()
        self.conv1 = Conv(3, 64, use_bn=False, padding=1)
        self.conv2 = Conv(64, 128, use_bn=False, padding=1)
        self.conv3 = Conv(128, 256, use_bn=True, padding=1)
        self.conv4 = Conv(256, 256, use_bn=False, padding=1)
        self.conv5 = Conv(256, 512, use_bn=True, padding=1)
        self.conv6 = Conv(512, 512, use_bn=False, padding=1)
        self.conv7 = Conv(512, 512, kernel_size=2,
                          pad_mode='valid', padding=0, use_bn=True)
        self.maxpool2d1 = nn.MaxPool2d(
            kernel_size=2, stride=2, pad_mode='same')
        self.maxpool2d2 = nn.MaxPool2d(kernel_size=(
            2, 1), stride=(2, 1), pad_mode='same')

        self.out_channels = 512

    def construct(self, x):
        x = self.conv1(x)
        x = self.maxpool2d1(x)
        x = self.conv2(x)
        x = self.maxpool2d1(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.maxpool2d2(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = self.maxpool2d2(x)
        x = self.conv7(x)
        return [x]
mindocr.models.base_model
mindocr.models.base_model.BaseModel

Bases: nn.Cell

Source code in mindocr\models\base_model.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class BaseModel(nn.Cell):
    def __init__(self, config: dict):
        """
        Args:
            config (dict): model config

        Inputs:
            x (Tensor): The input tensor feeding into the backbone, neck and head sequentially.
            y (Tensor): The extra input tensor. If it is provided, it will feed into the head. Default: None
        """
        super(BaseModel, self).__init__()

        config = Dict(config)

        if config.transform:
            transform_name = config.transform.pop('name')
            self.transform = build_trans(transform_name, **config.transform)
        else:
            self.transform = None

        backbone_name = config.backbone.pop('name')
        self.backbone = build_backbone(backbone_name, **config.backbone)

        assert hasattr(self.backbone, 'out_channels'), f'Backbones are required to provide out_channels attribute, ' \
                                                       f'but not found in {backbone_name}'

        if 'neck' not in config or config.neck is None:
            neck_name = 'Select'
        else:
            neck_name = config.neck.pop('name')
        self.neck = build_neck(neck_name, in_channels=self.backbone.out_channels, **config.neck)

        assert hasattr(self.neck, 'out_channels'), f'Necks are required to provide out_channels attribute, ' \
                                                   f'but not found in {neck_name}'

        head_name = config.head.pop('name')
        self.head = build_head(head_name, in_channels=self.neck.out_channels, **config.head)

        self.model_name = f'{backbone_name}_{neck_name}_{head_name}'

    def construct(self, x, y=None):
        if self.transform is not None:
            x = self.transform(x)

        # TODO: return bout, hout for debugging, using a dict.
        bout = self.backbone(x)

        nout = self.neck(bout)

        if y is not None:
            hout = self.head(nout, y)
        else:
            hout = self.head(nout)

        # resize back for postprocess
        # y = F.interpolate(y, size=(H, W), mode='bilinear', align_corners=True)

        return hout
mindocr.models.base_model.BaseModel.__init__(config)
PARAMETER DESCRIPTION
config

model config

TYPE: dict

Inputs

x (Tensor): The input tensor feeding into the backbone, neck and head sequentially. y (Tensor): The extra input tensor. If it is provided, it will feed into the head. Default: None

Source code in mindocr\models\base_model.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def __init__(self, config: dict):
    """
    Args:
        config (dict): model config

    Inputs:
        x (Tensor): The input tensor feeding into the backbone, neck and head sequentially.
        y (Tensor): The extra input tensor. If it is provided, it will feed into the head. Default: None
    """
    super(BaseModel, self).__init__()

    config = Dict(config)

    if config.transform:
        transform_name = config.transform.pop('name')
        self.transform = build_trans(transform_name, **config.transform)
    else:
        self.transform = None

    backbone_name = config.backbone.pop('name')
    self.backbone = build_backbone(backbone_name, **config.backbone)

    assert hasattr(self.backbone, 'out_channels'), f'Backbones are required to provide out_channels attribute, ' \
                                                   f'but not found in {backbone_name}'

    if 'neck' not in config or config.neck is None:
        neck_name = 'Select'
    else:
        neck_name = config.neck.pop('name')
    self.neck = build_neck(neck_name, in_channels=self.backbone.out_channels, **config.neck)

    assert hasattr(self.neck, 'out_channels'), f'Necks are required to provide out_channels attribute, ' \
                                               f'but not found in {neck_name}'

    head_name = config.head.pop('name')
    self.head = build_head(head_name, in_channels=self.neck.out_channels, **config.head)

    self.model_name = f'{backbone_name}_{neck_name}_{head_name}'
mindocr.models.builder

build models

mindocr.models.builder.build_model(name_or_config, **kwargs)

There are two ways to build a model. 1. load a predefined model according the given model name. 2. build the model according to the detailed configuration of the each module (transform, backbone, neck and head), for lower-level architecture customization.

PARAMETER DESCRIPTION
name_or_config

model name or config if it's a string, it should be a model name (which can be found by mindocr.list_models()) if it's a dict, it should be an architecture configuration defining the backbone/neck/head components (e.g., parsed from yaml config).

TYPE: Union[dict, str]

kwargs

options if name_or_config is a model name, supported args in kwargs are: - pretrained (bool): if True, pretrained checkpoint will be downloaded and loaded into the network. - ckpt_load_path (str): path to checkpoint file. if a non-empty string given, the local checkpoint will loaded into the network. if name_or_config is an architecture definition dict, supported args are: - ckpt_load_path (str): path to checkpoint file.

TYPE: dict DEFAULT: {}

Return

nn.Cell

from mindocr.models import build_model net = build_model(cfg['model']) net = build_model(cfg['model'], ckpt_load_path='./r50_fpn_dbhead.ckpt') # build network and load checkpoint net = build_model('dbnet_resnet50', pretrained=True)

Source code in mindocr\models\builder.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def build_model(name_or_config: Union[str, dict], **kwargs):
    """
    There are two ways to build a model.
        1. load a predefined model according the given model name.
        2. build the model according to the detailed configuration of the each module (transform, backbone, neck and
        head), for lower-level architecture customization.

    Args:
        name_or_config (Union[dict, str]): model name or config
            if it's a string, it should be a model name (which can be found by mindocr.list_models())
            if it's a dict, it should be an architecture configuration defining the backbone/neck/head components
            (e.g., parsed from yaml config).

        kwargs (dict): options
            if name_or_config is a model name, supported args in kwargs are:
                - pretrained (bool): if True, pretrained checkpoint will be downloaded and loaded into the network.
                - ckpt_load_path (str): path to checkpoint file. if a non-empty string given, the local checkpoint will
                  loaded into the network.
            if name_or_config is an architecture definition dict, supported args are:
                - ckpt_load_path (str): path to checkpoint file.

    Return:
        nn.Cell

    Example:
    >>>  from mindocr.models import build_model
    >>>  net = build_model(cfg['model'])
    >>>  net = build_model(cfg['model'], ckpt_load_path='./r50_fpn_dbhead.ckpt') # build network and load checkpoint
    >>>  net = build_model('dbnet_resnet50', pretrained=True)

    """
    is_customized_model = True
    if isinstance(name_or_config, str):
        # build model by specific model name
        model_name = name_or_config
        if is_model(model_name):
            create_fn = model_entrypoint(model_name)
            network = create_fn(**kwargs)
        else:
            raise ValueError(
                f"Invalid model name: {model_name}. Supported models are {list_models()}"
            )
        is_customized_model = False
    elif isinstance(name_or_config, dict):
        network = BaseModel(name_or_config)
    else:
        raise ValueError("Type error for config")

    # load checkpoint
    if "ckpt_load_path" in kwargs:
        load_from = kwargs["ckpt_load_path"]
        if isinstance(load_from, bool) and is_customized_model:
            raise ValueError(
                "Cannot find the pretrained checkpoint for a customized model without giving the url or local path "
                "to the checkpoint.\nPlease specify the url or local path by setting `model-pretrained` (if training) "
                "or `eval-ckpt_load_path` (if evaluation) in the yaml config"
            )

        load_model(network, load_from)

    if "amp_level" in kwargs:
        auto_mixed_precision(network, amp_level=kwargs["amp_level"])

    return network
mindocr.models.cls_mv3
mindocr.models.det_dbnet
mindocr.models.det_east
mindocr.models.det_psenet
mindocr.models.heads
mindocr.models.heads.build_head(head_name, **kwargs)

Build Head network.

PARAMETER DESCRIPTION
head_name

the head layer(s) name, which shoule be one of the supported_heads.

TYPE: str

kwargs

input args for the head network

TYPE: dict DEFAULT: {}

Return

nn.Cell for head module

Construct
Example
build CTCHead

from mindocr.models.heads import build_head config = dict(head_name='CTCHead', in_channels=256, out_channels=37) head = build_head(**config) print(head)

Source code in mindocr\models\heads\builder.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def build_head(head_name, **kwargs):
    """
    Build Head network.

    Args:
        head_name (str): the head layer(s) name, which shoule be one of the supported_heads.
        kwargs (dict): input args for the head network

    Return:
        nn.Cell for head module

    Construct:
        Input: Tensor
        Output: Dict[Tensor]

    Example:
        >>> # build CTCHead
        >>> from mindocr.models.heads import build_head
        >>> config = dict(head_name='CTCHead', in_channels=256, out_channels=37)
        >>> head = build_head(**config)
        >>> print(head)
    """
    assert head_name in supported_heads, f'Invalid head {head_name}. Supported heads are {supported_heads}'
    head = eval(head_name)(**kwargs)
    return head
mindocr.models.heads.builder
mindocr.models.heads.builder.build_head(head_name, **kwargs)

Build Head network.

PARAMETER DESCRIPTION
head_name

the head layer(s) name, which shoule be one of the supported_heads.

TYPE: str

kwargs

input args for the head network

TYPE: dict DEFAULT: {}

Return

nn.Cell for head module

Construct
Example
build CTCHead

from mindocr.models.heads import build_head config = dict(head_name='CTCHead', in_channels=256, out_channels=37) head = build_head(**config) print(head)

Source code in mindocr\models\heads\builder.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def build_head(head_name, **kwargs):
    """
    Build Head network.

    Args:
        head_name (str): the head layer(s) name, which shoule be one of the supported_heads.
        kwargs (dict): input args for the head network

    Return:
        nn.Cell for head module

    Construct:
        Input: Tensor
        Output: Dict[Tensor]

    Example:
        >>> # build CTCHead
        >>> from mindocr.models.heads import build_head
        >>> config = dict(head_name='CTCHead', in_channels=256, out_channels=37)
        >>> head = build_head(**config)
        >>> print(head)
    """
    assert head_name in supported_heads, f'Invalid head {head_name}. Supported heads are {supported_heads}'
    head = eval(head_name)(**kwargs)
    return head
mindocr.models.heads.cls_mv3_head
mindocr.models.heads.cls_mv3_head.ClsHead

Bases: nn.Cell

Text direction classification head.

Source code in mindocr\models\heads\cls_mv3_head.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class ClsHead(nn.Cell):
    """
    Text direction classification head.
    """
    def __init__(self, in_channels: int, hidden_channels: int, num_classes: int):
        super().__init__()
        self.pool = GlobalAvgPooling()
        self.classifier = nn.SequentialCell([
            nn.Dense(in_channels, hidden_channels),
            nn.HSwish(),
            nn.Dropout(keep_prob=0.8),
            nn.Dense(hidden_channels, num_classes),
        ])
        self.softmax = nn.Softmax(axis=-1)

    def construct(self, x: Tensor) -> Tensor:
        x = self.pool(x)
        x = x.astype('float32')
        x = self.classifier(x)
        x = self.softmax(x)
        return x
mindocr.models.heads.det_db_head
mindocr.models.heads.det_db_head.DBHead

Bases: nn.Cell

Source code in mindocr\models\heads\det_db_head.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class DBHead(nn.Cell):
    def __init__(self, in_channels: int, k=50, adaptive=False, bias=False, weight_init='HeUniform'):
        super().__init__()
        self.adaptive = adaptive

        self.segm = self._init_heatmap(in_channels, in_channels // 4, weight_init, bias)
        if self.adaptive:
            self.thresh = self._init_heatmap(in_channels, in_channels // 4, weight_init, bias)
            self.k = k
            self.diff_bin = nn.Sigmoid()

    @staticmethod
    def _init_heatmap(in_channels, inter_channels, weight_init, bias):
        return nn.SequentialCell([  # `pred` block from the original work
            nn.Conv2d(in_channels, inter_channels, kernel_size=3, padding=1, pad_mode='pad', has_bias=bias,
                      weight_init=weight_init),
            nn.BatchNorm2d(inter_channels),
            nn.ReLU(),
            nn.Conv2dTranspose(inter_channels, inter_channels, kernel_size=2, stride=2, pad_mode='valid', has_bias=True,
                               weight_init=weight_init),
            nn.BatchNorm2d(inter_channels),
            nn.ReLU(),
            nn.Conv2dTranspose(inter_channels, 1, kernel_size=2, stride=2, pad_mode='valid', has_bias=True,
                               weight_init=weight_init),
            nn.Sigmoid()
        ])

    def construct(self, features: ms.Tensor) -> Union[ms.Tensor, Tuple[ms.Tensor, ...]]:
        """
        Args:
            features (Tensor): encoded features
        Returns:
            Union(
            binary: predicted binary map
            thresh: predicted threshold map (only return if adaptive is True in training)
            thresh_binary: differentiable binary map (only if adaptive is True in training)
        """
        binary = self.segm(features)

        if self.adaptive and self.training:
            # only use binary map to derive polygons in inference
            thresh = self.thresh(features)
            thresh_binary = self.diff_bin(self.k * binary - thresh)  # Differentiable Binarization
            return binary, thresh, thresh_binary

        return binary
mindocr.models.heads.det_db_head.DBHead.construct(features)
PARAMETER DESCRIPTION
features

encoded features

TYPE: Tensor

RETURNS DESCRIPTION
Union[ms.Tensor, Tuple[ms.Tensor, ...]]

Union(

binary

predicted binary map

TYPE: Union[ms.Tensor, Tuple[ms.Tensor, ...]]

thresh

predicted threshold map (only return if adaptive is True in training)

TYPE: Union[ms.Tensor, Tuple[ms.Tensor, ...]]

thresh_binary

differentiable binary map (only if adaptive is True in training)

TYPE: Union[ms.Tensor, Tuple[ms.Tensor, ...]]

Source code in mindocr\models\heads\det_db_head.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def construct(self, features: ms.Tensor) -> Union[ms.Tensor, Tuple[ms.Tensor, ...]]:
    """
    Args:
        features (Tensor): encoded features
    Returns:
        Union(
        binary: predicted binary map
        thresh: predicted threshold map (only return if adaptive is True in training)
        thresh_binary: differentiable binary map (only if adaptive is True in training)
    """
    binary = self.segm(features)

    if self.adaptive and self.training:
        # only use binary map to derive polygons in inference
        thresh = self.thresh(features)
        thresh_binary = self.diff_bin(self.k * binary - thresh)  # Differentiable Binarization
        return binary, thresh, thresh_binary

    return binary
mindocr.models.heads.rec_attn_head
mindocr.models.heads.rec_attn_head.AttentionHead

Bases: nn.Cell

Source code in mindocr\models\heads\rec_attn_head.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class AttentionHead(nn.Cell):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        hidden_size: int = 256,
        batch_max_length: int = 25,
    ) -> None:
        """
        Inputs:
            inputs: shape [W, BS, 2*C]
            label: shape [BS, W]
        """
        super().__init__()
        self.in_channels = in_channels
        self.hidden_size = hidden_size
        self.num_classes = out_channels
        self.batch_max_length = batch_max_length

        self.attention_cell = AttentionCell(
            self.in_channels, self.hidden_size, self.num_classes
        )
        self.generator = nn.Dense(hidden_size, self.num_classes)

        self.one = Tensor(1.0, ms.float32)
        self.zero = Tensor(0.0, ms.float32)

        self.argmax = ops.Argmax(axis=1)

    def _char_to_onehot(self, input_char: Tensor, onehot_dim: int) -> Tensor:
        input_one_hot = ops.one_hot(input_char, onehot_dim, self.one, self.zero)
        return input_one_hot

    def construct(self, inputs: Tensor, targets: Optional[Tensor] = None) -> Tensor:
        # convert the inputs from [W, BS, C] to [BS, W, C]
        inputs = ops.transpose(inputs, (1, 0, 2))
        N = inputs.shape[0]
        num_steps = self.batch_max_length + 1  # for <STOP> symbol

        hidden = ops.zeros((N, self.hidden_size), inputs.dtype)

        if targets is not None:
            # training branch
            output_hiddens = list()
            for i in range(num_steps):
                char_onehots = self._char_to_onehot(targets[:, i], self.num_classes)
                hidden, _ = self.attention_cell(hidden, inputs, char_onehots)
                output_hiddens.append(ops.expand_dims(hidden, axis=1))
            output = ops.concat(output_hiddens, axis=1)
            probs = self.generator(output)
        else:
            # inference branch
            # <GO> symbol
            targets = ops.zeros((N,), ms.int32)
            probs = list()
            for i in range(num_steps):
                char_onehots = self._char_to_onehot(targets, self.num_classes)
                hidden, _ = self.attention_cell(hidden, inputs, char_onehots)
                probs_step = self.generator(hidden)
                probs.append(probs_step)
                next_input = self.argmax(probs_step)
                targets = next_input
            probs = ops.stack(probs, axis=1)
            probs = ops.softmax(probs, axis=2)
        return probs
mindocr.models.heads.rec_attn_head.AttentionHead.__init__(in_channels, out_channels, hidden_size=256, batch_max_length=25)
Inputs
Source code in mindocr\models\heads\rec_attn_head.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def __init__(
    self,
    in_channels: int,
    out_channels: int,
    hidden_size: int = 256,
    batch_max_length: int = 25,
) -> None:
    """
    Inputs:
        inputs: shape [W, BS, 2*C]
        label: shape [BS, W]
    """
    super().__init__()
    self.in_channels = in_channels
    self.hidden_size = hidden_size
    self.num_classes = out_channels
    self.batch_max_length = batch_max_length

    self.attention_cell = AttentionCell(
        self.in_channels, self.hidden_size, self.num_classes
    )
    self.generator = nn.Dense(hidden_size, self.num_classes)

    self.one = Tensor(1.0, ms.float32)
    self.zero = Tensor(0.0, ms.float32)

    self.argmax = ops.Argmax(axis=1)
mindocr.models.heads.rec_ctc_head
mindocr.models.heads.rec_ctc_head.CTCHead

Bases: nn.Cell

An MLP module for CTC Loss. For CRNN, the input should be in shape [W, BS, 2*C], which is output by RNNEncoder. The MLP encodes and classifies the features, then return a logit tensor in shape [W, BS, num_classes] For chinese words, num_classes can be over 60,000, so weight regulaization may matter.

Source code in mindocr\models\heads\rec_ctc_head.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
class CTCHead(nn.Cell):
    """
    An MLP module for CTC Loss.
    For CRNN, the input should be in shape [W, BS, 2*C], which is output by RNNEncoder.
    The MLP encodes and classifies the features, then return a logit tensor in shape [W, BS, num_classes]
    For chinese words, num_classes can be over 60,000, so weight regulaization may matter.

    Args:

    Example:

    """

    # TODO: add dropout regularization.
    #  I think it will benefit the performance of 2-layer MLP for chinese text recoginition.
    def __init__(self,
                 in_channels,
                 out_channels,
                 # fc_decay: float=0.0004,
                 mid_channels: int = None,
                 return_feats: bool = False,
                 weight_init: str = 'normal',  # 'xavier_uniform',
                 bias_init: str = 'zeros',  # 'xavier_uniform',
                 dropout: float = 0.):
        super().__init__()
        # TODO:
        #  Diff:
        #    1. paddle initialize weight and bias with a Xaivier Uniform variant.
        #    2. paddle uses L2 decay on FC weight and bias with specified decay factor fc_decay 0.00002.

        self.out_channels = out_channels
        self.mid_channels = mid_channels
        self.return_feats = return_feats

        if weight_init == "crnn_customised":
            weight_init = crnn_head_initialization(in_channels)

        if bias_init == "crnn_customised":
            bias_init = crnn_head_initialization(in_channels)

        # TODO: paddle is not using the exact XavierUniform. It uses check which is better.
        # w_init = 'xavier_uniform'
        # b_init = 'xavier_uniform'
        if mid_channels is None:
            self.dense1 = nn.Dense(in_channels, out_channels, weight_init=weight_init, bias_init=bias_init)
        else:
            # TODO: paddle did not use activation after linear, why no activation?
            self.dense1 = nn.Dense(in_channels, mid_channels, weight_init=weight_init, bias_init=bias_init)
            # self.activation = nn.GeLU()
            self.dropout = nn.Dropout(keep_prob=1 - dropout)
            self.dense2 = nn.Dense(mid_channels, out_channels, weight_init=weight_init, bias_init=bias_init)
            # self.dropout = nn.Dropout(keep_prob)

    def construct(self, x: ms.Tensor) -> ms.Tensor:
        """Feed Forward construct.
        Args:
            x (Tensor): feature in shape [W, BS, 2*C]
        Returns:
            h (Tensor): if training, h is logits in shape [W, BS, num_classes], where W - sequence len, fixed.
                (dim order required by ms.ctcloss)
                        if not training, h is class probabilites in shape [BS, W, num_classes].
        """
        h = self.dense1(x)
        # x = self.dropout(x)
        if self.mid_channels is not None:
            h = self.dropout(h)
            h = self.dense2(h)

        if not self.training:
            # h = ops.softmax(h, axis=2) # not support on ms 1.8.1
            h = ops.Softmax(axis=2)(h)
            h = h.transpose((1, 0, 2))

        return h
mindocr.models.heads.rec_ctc_head.CTCHead.construct(x)

Feed Forward construct.

PARAMETER DESCRIPTION
x

feature in shape [W, BS, 2*C]

TYPE: Tensor

RETURNS DESCRIPTION
h

if training, h is logits in shape [W, BS, num_classes], where W - sequence len, fixed. (dim order required by ms.ctcloss) if not training, h is class probabilites in shape [BS, W, num_classes].

TYPE: Tensor

Source code in mindocr\models\heads\rec_ctc_head.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def construct(self, x: ms.Tensor) -> ms.Tensor:
    """Feed Forward construct.
    Args:
        x (Tensor): feature in shape [W, BS, 2*C]
    Returns:
        h (Tensor): if training, h is logits in shape [W, BS, num_classes], where W - sequence len, fixed.
            (dim order required by ms.ctcloss)
                    if not training, h is class probabilites in shape [BS, W, num_classes].
    """
    h = self.dense1(x)
    # x = self.dropout(x)
    if self.mid_channels is not None:
        h = self.dropout(h)
        h = self.dense2(h)

    if not self.training:
        # h = ops.softmax(h, axis=2) # not support on ms 1.8.1
        h = ops.Softmax(axis=2)(h)
        h = h.transpose((1, 0, 2))

    return h
mindocr.models.necks
mindocr.models.necks.build_neck(neck_name, **kwargs)

Build Neck network.

PARAMETER DESCRIPTION
neck_name

the neck name, which shoule be one of the supported_necks.

TYPE: str

kwargs

input args for the neck network

TYPE: dict DEFAULT: {}

Return

nn.Cell for neck module

Construct
Example
build RNNEncoder

from mindocr.models.necks import build_neck config = dict(neck_name='RNNEncoder', in_channels=128, hidden_size=256) neck = build_neck(**config) print(neck)

Source code in mindocr\models\necks\builder.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def build_neck(neck_name, **kwargs):
    """
    Build Neck network.

    Args:
        neck_name (str): the neck name, which shoule be one of the supported_necks.
        kwargs (dict): input args for the neck network

    Return:
        nn.Cell for neck module

    Construct:
        Input: Tensor
        Output: Dict[Tensor]

    Example:
        >>> # build RNNEncoder
        >>> from mindocr.models.necks import build_neck
        >>> config = dict(neck_name='RNNEncoder', in_channels=128, hidden_size=256)
        >>> neck = build_neck(**config)
        >>> print(neck)
    """
    assert neck_name in supported_necks, f'Invalid neck: {neck_name}, Support necks are {supported_necks}'
    neck = eval(neck_name)(**kwargs)
    return neck
mindocr.models.necks.asf
mindocr.models.necks.asf.AdaptiveScaleFusion

Bases: nn.Cell

Adaptive Scale Fusion module from the DBNet++ <https://arxiv.org/abs/2202.10304>__ paper.

PARAMETER DESCRIPTION
channels

number of input to and output channels from ASF

channel_attention

use channel attention

DEFAULT: True

Source code in mindocr\models\necks\asf.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class AdaptiveScaleFusion(nn.Cell):
    """
    Adaptive Scale Fusion module from the `DBNet++ <https://arxiv.org/abs/2202.10304>`__ paper.
    Args:
        channels: number of input to and output channels from ASF
        channel_attention: use channel attention
    """
    def __init__(self, channels, channel_attention=True, weight_init='HeUniform'):
        super().__init__()
        out_channels = channels // 4
        self.conv = nn.Conv2d(channels, out_channels, kernel_size=3, padding=1, pad_mode='pad', has_bias=True,
                              weight_init=weight_init)

        self.chan_att = nn.SequentialCell([
            nn.Conv2d(out_channels, out_channels // 4, kernel_size=1, pad_mode='valid', weight_init=weight_init),
            nn.ReLU(),
            nn.Conv2d(out_channels // 4, out_channels, kernel_size=1, pad_mode='valid', weight_init=weight_init),
            nn.Sigmoid()
        ]) if channel_attention else None

        self.spat_att = nn.SequentialCell([
            nn.Conv2d(1, 1, kernel_size=3, padding=1, pad_mode='pad', weight_init=weight_init),
            nn.ReLU(),
            nn.Conv2d(1, 1, kernel_size=1, pad_mode='valid', weight_init=weight_init),
            nn.Sigmoid()
        ])

        self.scale_att = nn.SequentialCell([
            nn.Conv2d(out_channels, 4, kernel_size=1, pad_mode='valid', weight_init=weight_init),
            nn.Sigmoid()
        ])

    def construct(self, x):
        reduced = self.conv(ops.concat(x, axis=1))

        if self.chan_att is not None:
            ada_pool = ops.mean(reduced, axis=(-2, -1), keep_dims=True)  # equivalent to nn.AdaptiveAvgPool2d(1)
            reduced = self.chan_att(ada_pool) + reduced

        spatial = ops.mean(reduced, axis=1, keep_dims=True)
        spat_att = self.spat_att(spatial) + reduced

        scale_att = self.scale_att(spat_att)
        return ops.concat([scale_att[:, i:i + 1] * x[i] for i in range(len(x))], axis=1)
mindocr.models.necks.builder
mindocr.models.necks.builder.build_neck(neck_name, **kwargs)

Build Neck network.

PARAMETER DESCRIPTION
neck_name

the neck name, which shoule be one of the supported_necks.

TYPE: str

kwargs

input args for the neck network

TYPE: dict DEFAULT: {}

Return

nn.Cell for neck module

Construct
Example
build RNNEncoder

from mindocr.models.necks import build_neck config = dict(neck_name='RNNEncoder', in_channels=128, hidden_size=256) neck = build_neck(**config) print(neck)

Source code in mindocr\models\necks\builder.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def build_neck(neck_name, **kwargs):
    """
    Build Neck network.

    Args:
        neck_name (str): the neck name, which shoule be one of the supported_necks.
        kwargs (dict): input args for the neck network

    Return:
        nn.Cell for neck module

    Construct:
        Input: Tensor
        Output: Dict[Tensor]

    Example:
        >>> # build RNNEncoder
        >>> from mindocr.models.necks import build_neck
        >>> config = dict(neck_name='RNNEncoder', in_channels=128, hidden_size=256)
        >>> neck = build_neck(**config)
        >>> print(neck)
    """
    assert neck_name in supported_necks, f'Invalid neck: {neck_name}, Support necks are {supported_necks}'
    neck = eval(neck_name)(**kwargs)
    return neck
mindocr.models.necks.fpn
mindocr.models.necks.fpn.DBFPN

Bases: nn.Cell

Source code in mindocr\models\necks\fpn.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class DBFPN(nn.Cell):
    def __init__(self, in_channels, out_channels=256, weight_init='HeUniform',
                 bias=False, use_asf=False, channel_attention=True):
        """
        in_channels: resnet18=[64, 128, 256, 512]
                    resnet50=[2048,1024,512,256]
        out_channels: Inner channels in Conv2d

        bias: Whether conv layers have bias or not.
        use_asf: use ASF module for multi-scale feature aggregation (DBNet++ only)
        channel_attention: use channel attention in ASF module
        """
        super().__init__()
        self.out_channels = out_channels

        self.unify_channels = nn.CellList(
            [nn.Conv2d(ch, out_channels, 1, pad_mode='valid', has_bias=bias, weight_init=weight_init)
             for ch in in_channels]
        )

        self.out = nn.CellList(
            [nn.Conv2d(out_channels, out_channels // 4, 3, padding=1, pad_mode='pad', has_bias=bias,
                       weight_init=weight_init) for _ in range(len(in_channels))]
        )

        self.fuse = AdaptiveScaleFusion(out_channels, channel_attention, weight_init) if use_asf else ops.Concat(axis=1)

    def construct(self, features):
        for i, uc_op in enumerate(self.unify_channels):
            features[i] = uc_op(features[i])

        for i in range(2, -1, -1):
            features[i] += _resize_nn(features[i + 1], shape=features[i].shape[2:])

        for i, out in enumerate(self.out):
            features[i] = _resize_nn(out(features[i]), shape=features[0].shape[2:])

        return self.fuse(features[::-1])   # matching the reverse order of the original work
mindocr.models.necks.fpn.DBFPN.__init__(in_channels, out_channels=256, weight_init='HeUniform', bias=False, use_asf=False, channel_attention=True)
resnet18=[64, 128, 256, 512]

resnet50=[2048,1024,512,256]

bias: Whether conv layers have bias or not. use_asf: use ASF module for multi-scale feature aggregation (DBNet++ only) channel_attention: use channel attention in ASF module

Source code in mindocr\models\necks\fpn.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def __init__(self, in_channels, out_channels=256, weight_init='HeUniform',
             bias=False, use_asf=False, channel_attention=True):
    """
    in_channels: resnet18=[64, 128, 256, 512]
                resnet50=[2048,1024,512,256]
    out_channels: Inner channels in Conv2d

    bias: Whether conv layers have bias or not.
    use_asf: use ASF module for multi-scale feature aggregation (DBNet++ only)
    channel_attention: use channel attention in ASF module
    """
    super().__init__()
    self.out_channels = out_channels

    self.unify_channels = nn.CellList(
        [nn.Conv2d(ch, out_channels, 1, pad_mode='valid', has_bias=bias, weight_init=weight_init)
         for ch in in_channels]
    )

    self.out = nn.CellList(
        [nn.Conv2d(out_channels, out_channels // 4, 3, padding=1, pad_mode='pad', has_bias=bias,
                   weight_init=weight_init) for _ in range(len(in_channels))]
    )

    self.fuse = AdaptiveScaleFusion(out_channels, channel_attention, weight_init) if use_asf else ops.Concat(axis=1)
mindocr.models.necks.img2seq
mindocr.models.necks.img2seq.Img2Seq

Bases: nn.Cell

Source code in mindocr\models\necks\img2seq.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Img2Seq(nn.Cell):
    """
    Inputs: feature list with shape [N, C, 1, W]
    Outputs: first feature with shape [W, N, C]
    """

    def __init__(self, in_channels: int) -> None:
        super().__init__()
        self.out_channels = in_channels

    def construct(self, features: List[Tensor]) -> Tensor:
        x = features[0]
        x = ops.squeeze(x, axis=2)
        x = ops.transpose(x, (2, 0, 1))
        return x
mindocr.models.necks.rnn
mindocr.models.necks.rnn.RNNEncoder

Bases: nn.Cell

CRNN sequence encoder which contains reshape and bidirectional LSTM layers. Receive visual features [N, C, 1, W] Reshape features to shape [W, N, C] Use Bi-LSTM to encode into new features in shape [W, N, 2*C]. where W - seq len, N - batch size, C - feature len

PARAMETER DESCRIPTION
input_channels

C, number of input channels, corresponding to feature length

TYPE: int

hidden_size(int)

the hidden size in LSTM layers, default is 512

Source code in mindocr\models\necks\rnn.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class RNNEncoder(nn.Cell):
    """
     CRNN sequence encoder which contains reshape and bidirectional LSTM layers.
     Receive visual features [N, C, 1, W]
     Reshape features to shape [W, N, C]
     Use Bi-LSTM to encode into new features in shape [W, N, 2*C].
     where W - seq len, N - batch size, C - feature len

     Args:
        input_channels (int):  C, number of input channels, corresponding to feature length
        hidden_size(int): the hidden size in LSTM layers, default is 512
     """

    def __init__(self, in_channels, hidden_size=512, batch_size=None):
        super().__init__()
        self.out_channels = 2 * hidden_size

        self.seq_encoder = nn.LSTM(input_size=in_channels,
                                   hidden_size=hidden_size,
                                   num_layers=2,
                                   has_bias=True,
                                   dropout=0.,
                                   bidirectional=True)

        # TODO: do we need to add batch size to compute hx menioned in MindSpore LSTM doc
        self.hx = None
        if batch_size is not None:
            h0 = Tensor(np.zeros([2 * 2, batch_size, hidden_size]).astype(np.float32))
            c0 = Tensor(np.zeros([2 * 2, batch_size, hidden_size]).astype(np.float32))
            self.hx = (h0, c0)

    def construct(self, features):
        """
        Args:
            x (Tensor): feature, a Tensor of shape :math:`(N, C, 1, W)`.
                Note that H must be 1. Width W can be viewed as time length in CRNN algorithm.
                C - input channels can be viewed as feature length for each time step.  N is batch size.

        Returns:
            Tensor: Encoded features . Shape :math:`(W, N, 2*C)` where
        """
        x = features[0]
        assert x.shape[2] == 1, f'Feature height must be 1, but got {x.shape[2]} from x.shape {x.shape}'
        x = ops.squeeze(x, axis=2)  # [N, C, W]
        x = ops.transpose(x, (2, 0, 1))  # [W, N, C]

        if self.hx is None:
            x, hx_n = self.seq_encoder(x)
        else:
            print('using self.hx')
            x, hx_n = self.seq_encoder(x, self.hx)  # the results are the same

        return x
mindocr.models.necks.rnn.RNNEncoder.construct(features)
PARAMETER DESCRIPTION
x

feature, a Tensor of shape :math:(N, C, 1, W). Note that H must be 1. Width W can be viewed as time length in CRNN algorithm. C - input channels can be viewed as feature length for each time step. N is batch size.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

Encoded features . Shape :math:(W, N, 2*C) where

Source code in mindocr\models\necks\rnn.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def construct(self, features):
    """
    Args:
        x (Tensor): feature, a Tensor of shape :math:`(N, C, 1, W)`.
            Note that H must be 1. Width W can be viewed as time length in CRNN algorithm.
            C - input channels can be viewed as feature length for each time step.  N is batch size.

    Returns:
        Tensor: Encoded features . Shape :math:`(W, N, 2*C)` where
    """
    x = features[0]
    assert x.shape[2] == 1, f'Feature height must be 1, but got {x.shape[2]} from x.shape {x.shape}'
    x = ops.squeeze(x, axis=2)  # [N, C, W]
    x = ops.transpose(x, (2, 0, 1))  # [W, N, C]

    if self.hx is None:
        x, hx_n = self.seq_encoder(x)
    else:
        print('using self.hx')
        x, hx_n = self.seq_encoder(x, self.hx)  # the results are the same

    return x
mindocr.models.necks.select
mindocr.models.necks.select.Select

Bases: nn.Cell

select feature from the backbone output.

Source code in mindocr\models\necks\select.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class Select(nn.Cell):
    '''
    select feature from the backbone output.
    '''
    def __init__(self, in_channels, index=-1):
        super().__init__()
        self.index = index
        self.out_channels = in_channels[index]

    def construct(self, x):
        if isinstance(x, list) or isinstance(x, tuple):
            return x[self.index]
        else:
            return x
mindocr.models.rec_crnn
mindocr.models.rec_rare
mindocr.models.rec_svtr
mindocr.models.utils
mindocr.models.utils.load_model
mindocr.models.utils.load_model.load_model(network, load_from=None, filter_fn=None, auto_mapping=False, strict=False)

Load the checkpoint into the model

PARAMETER DESCRIPTION
network

network

load_from

a string that can be url or local path to a checkpoint, that will be loaded to the network.

TYPE: Optional[str] DEFAULT: None

filter_fn

a function filtering the parameters that will be loading into the network. If it is None, all parameters will be loaded.

TYPE: Optional[Callable[[Dict], Dict]] DEFAULT: None

auto_mapping

when it is True, then load the paramters even if the names are slightly different

TYPE: bool DEFAULT: False

strict

If it is true, then the shape and the type of the parameters in the checkpoint and the network should be consistent raise exception if they do not match.

TYPE: bool DEFAULT: False

Source code in mindocr\models\utils\load_model.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def load_model(
    network,
    load_from: Optional[str] = None,
    filter_fn: Optional[Callable[[Dict], Dict]] = None,
    auto_mapping: bool = False,
    strict: bool = False,
):
    """
    Load the checkpoint into the model

    Args:
        network: network
        load_from: a string that can be url or local path to a checkpoint, that will be loaded to the network.
        filter_fn: a function filtering the parameters that will be loading into the network. If it is None,
            all parameters will be loaded.
        auto_mapping: when it is True, then load the paramters even if the names are slightly different
        strict: If it is true, then the shape and the type of the parameters in the checkpoint and the network
            should be consistent
            raise exception if they do not match.
    """
    if load_from is None:
        return

    if load_from[:4] == "http":
        url_cfg = {"url": load_from}
        local_ckpt_path = download_pretrained(url_cfg)
    else:
        local_ckpt_path = load_from

    assert local_ckpt_path and os.path.exists(local_ckpt_path), (
        f"Failed to load checkpoint. `{local_ckpt_path}` NOT exist. \n"
        "Please check the path and set it in `eval-ckpt_load_path` or `model-pretrained` in the yaml config file "
    )

    params = load_checkpoint(local_ckpt_path)

    if filter_fn is not None:
        params = filter_fn(params)

    if auto_mapping:
        params = auto_map(network, params)

    if not strict:
        params = drop_inconsistent_shape_parameters(network, params)

    load_param_into_net(network, params, strict_load=strict)

    print(
        f"INFO: Finish loading model checkoint from {load_from}. "
        "If no parameter fail-load warning displayed, all checkpoint params have been successfully loaded."
    )
mindocr.models.utils.rnn_cells

RNN Cells that supports FP16 inputs

mindocr.models.utils.rnn_cells.GRUCell

Bases: RNNCellBase

A GRU(Gated Recurrent Unit) cell.

.. math::

\begin{array}{ll}
r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\
z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\
n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\
h' = (1 - z) * n + z * h
\end{array}

Here :math:\sigma is the sigmoid function, and :math:* is the Hadamard product. :math:W, b are learnable weights between the output and the input in the formula. For instance, :math:W_{ir}, b_{ir} are the weight and bias used to transform from input :math:x to :math:r. Details can be found in paper Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation <https://aclanthology.org/D14-1179.pdf>_.

The LSTMCell can be simplified in NN layer, the following formula:

.. math:: h{'},c = LSTMCell(x, (h_0, c_0))

PARAMETER DESCRIPTION
input_size

Number of features of input.

TYPE: int

hidden_size

Number of features of hidden layer.

TYPE: int

has_bias

Whether the cell has bias b_in and b_hn. Default: True.

TYPE: bool DEFAULT: True

Inputs
  • x (Tensor) - Tensor of shape (batch_size, input_size).
  • hx (Tensor) - Tensor of data type mindspore.float32 and shape (batch_size, hidden_size). Data type of hx must be the same as x.
Outputs
  • hx' (Tensor) - Tensor of shape (batch_size, hidden_size).
RAISES DESCRIPTION
TypeError

If input_size, hidden_size is not an int.

TypeError

If has_bias is not a bool.

Supported Platforms

Ascend GPU CPU

Examples:

>>> net = nn.GRUCell(10, 16)
>>> x = Tensor(np.ones([5, 3, 10]).astype(np.float32))
>>> hx = Tensor(np.ones([3, 16]).astype(np.float32))
>>> output = []
>>> for i in range(5):
...     hx = net(x[i], hx)
...     output.append(hx)
>>> print(output[0].shape)
(3, 16)
Source code in mindocr\models\utils\rnn_cells.py
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
class GRUCell(RNNCellBase):
    r"""
    A GRU(Gated Recurrent Unit) cell.

    .. math::

        \begin{array}{ll}
        r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\
        z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\
        n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\
        h' = (1 - z) * n + z * h
        \end{array}

    Here :math:`\sigma` is the sigmoid function, and :math:`*` is the Hadamard product. :math:`W, b`
    are learnable weights between the output and the input in the formula. For instance,
    :math:`W_{ir}, b_{ir}` are the weight and bias used to transform from input :math:`x` to :math:`r`.
    Details can be found in paper
    `Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    <https://aclanthology.org/D14-1179.pdf>`_.

    The LSTMCell can be simplified in NN layer, the following formula:

    .. math::
        h^{'},c^{'} = LSTMCell(x, (h_0, c_0))

    Args:
        input_size (int): Number of features of input.
        hidden_size (int):  Number of features of hidden layer.
        has_bias (bool): Whether the cell has bias `b_in` and `b_hn`. Default: True.

    Inputs:
        - **x** (Tensor) - Tensor of shape (batch_size, `input_size`).
        - **hx** (Tensor) - Tensor of data type mindspore.float32 and shape (batch_size, `hidden_size`).
          Data type of `hx` must be the same as `x`.

    Outputs:
        - **hx'** (Tensor) - Tensor of shape (batch_size, `hidden_size`).

    Raises:
        TypeError: If `input_size`, `hidden_size` is not an int.
        TypeError: If `has_bias` is not a bool.

    Supported Platforms:
        ``Ascend`` ``GPU`` ``CPU``

    Examples:
        >>> net = nn.GRUCell(10, 16)
        >>> x = Tensor(np.ones([5, 3, 10]).astype(np.float32))
        >>> hx = Tensor(np.ones([3, 16]).astype(np.float32))
        >>> output = []
        >>> for i in range(5):
        ...     hx = net(x[i], hx)
        ...     output.append(hx)
        >>> print(output[0].shape)
        (3, 16)
    """
    def __init__(self, input_size: int, hidden_size: int, has_bias: bool = True):
        super().__init__(input_size, hidden_size, has_bias, num_chunks=3)

    def construct(self, x, hx):
        _check_batch_size_equal(x.shape[0], hx.shape[0], self.cls_name)

        # FIX: make sure the weight and bias dtype is same as the data type from x
        # prevent the input type inconsistent error from P.MatMul operator
        weight_ih = P.cast(self.weight_ih, x.dtype)
        weight_hh = P.cast(self.weight_hh, x.dtype)
        bias_ih = P.cast(self.bias_ih, x.dtype)
        bias_hh = P.cast(self.bias_hh, x.dtype)

        return _gru_cell(x, hx, weight_ih, weight_hh, bias_ih, bias_hh)

mindocr.optim

optim init

mindocr.optim.adamw

Gradient clipping wrapper for optimizers.

mindocr.optim.adamw.AdamW

Bases: Optimizer

Implements the gradient clipping by norm for a AdamWeightDecay optimizer.

Source code in mindocr\optim\adamw.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
class AdamW(Optimizer):
    """
    Implements the gradient clipping by norm for a AdamWeightDecay optimizer.
    """

    @opt_init_args_register
    def __init__(
        self,
        params,
        learning_rate=1e-3,
        beta1=0.9,
        beta2=0.999,
        eps=1e-8,
        weight_decay=0.0,
        loss_scale=1.0,
        clip=False,
    ):
        super().__init__(learning_rate, params, weight_decay)
        _check_param_value(beta1, beta2, eps, self.cls_name)
        self.beta1 = Tensor(np.array([beta1]).astype(np.float32))
        self.beta2 = Tensor(np.array([beta2]).astype(np.float32))
        self.eps = Tensor(np.array([eps]).astype(np.float32))
        self.moments1 = self.parameters.clone(prefix="adam_m", init="zeros")
        self.moments2 = self.parameters.clone(prefix="adam_v", init="zeros")
        self.hyper_map = ops.HyperMap()
        self.beta1_power = Parameter(initializer(1, [1], ms.float32), name="beta1_power")
        self.beta2_power = Parameter(initializer(1, [1], ms.float32), name="beta2_power")

        self.reciprocal_scale = Tensor(1.0 / loss_scale, ms.float32)
        self.clip = clip

    def construct(self, gradients):
        lr = self.get_lr()
        gradients = scale_grad(gradients, self.reciprocal_scale)
        if self.clip:
            gradients = ops.clip_by_global_norm(gradients, 5.0, None)

        beta1_power = self.beta1_power * self.beta1
        self.beta1_power = beta1_power
        beta2_power = self.beta2_power * self.beta2
        self.beta2_power = beta2_power

        if self.is_group:
            if self.is_group_lr:
                optim_result = self.hyper_map(
                    ops.partial(_adam_opt, beta1_power, beta2_power, self.beta1, self.beta2, self.eps),
                    lr,
                    self.weight_decay,
                    self.parameters,
                    self.moments1,
                    self.moments2,
                    gradients,
                    self.decay_flags,
                    self.optim_filter,
                )
            else:
                optim_result = self.hyper_map(
                    ops.partial(_adam_opt, beta1_power, beta2_power, self.beta1, self.beta2, self.eps, lr),
                    self.weight_decay,
                    self.parameters,
                    self.moments1,
                    self.moments2,
                    gradients,
                    self.decay_flags,
                    self.optim_filter,
                )
        else:
            optim_result = self.hyper_map(
                ops.partial(
                    _adam_opt, beta1_power, beta2_power, self.beta1, self.beta2, self.eps, lr, self.weight_decay
                ),
                self.parameters,
                self.moments1,
                self.moments2,
                gradients,
                self.decay_flags,
                self.optim_filter,
            )
        if self.use_parallel:
            self.broadcast_params(optim_result)
        return optim_result
mindocr.optim.adamw.tensor_grad_scale(scale, grad)

Get grad with scale.

Source code in mindocr\optim\adamw.py
34
35
36
37
38
39
@_grad_scale.register("Number", "Tensor")
def tensor_grad_scale(scale, grad):
    """Get grad with scale."""
    if scale == 1.0:
        return grad
    return ops.mul(grad, ops.cast(scale, grad.dtype))
mindocr.optim.adamw.tensor_grad_scale_with_tensor(scale, grad)

Get grad with scale.

Source code in mindocr\optim\adamw.py
42
43
44
45
@_grad_scale.register("Tensor", "Tensor")
def tensor_grad_scale_with_tensor(scale, grad):
    """Get grad with scale."""
    return ops.mul(grad, ops.cast(scale, grad.dtype))
mindocr.optim.adan

adan

mindocr.optim.adan.Adan

Bases: Optimizer

The Adan (ADAptive Nesterov momentum algorithm) Optimizer from https://arxiv.org/abs/2208.06677

Note: it is an experimental version.

Source code in mindocr\optim\adan.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
class Adan(Optimizer):
    """
    The Adan (ADAptive Nesterov momentum algorithm) Optimizer from https://arxiv.org/abs/2208.06677

    Note: it is an experimental version.
    """

    @opt_init_args_register
    def __init__(
        self,
        params,
        learning_rate=1e-3,
        beta1=0.98,
        beta2=0.92,
        beta3=0.99,
        eps=1e-8,
        use_locking=False,
        weight_decay=0.0,
        loss_scale=1.0,
    ):
        super().__init__(
            learning_rate, params, weight_decay=weight_decay, loss_scale=loss_scale
        )  # Optimized inherit weight decay is bloaked. weight decay is computed in this py.

        _check_param_value(beta1, beta2, eps, self.cls_name)
        assert isinstance(use_locking, bool), f"For {self.cls_name}, use_looking should be bool"

        self.beta1 = Tensor(beta1, mstype.float32)
        self.beta2 = Tensor(beta2, mstype.float32)
        self.beta3 = Tensor(beta3, mstype.float32)

        self.eps = Tensor(eps, mstype.float32)
        self.use_locking = use_locking
        self.moment1 = self._parameters.clone(prefix="moment1", init="zeros")  # m
        self.moment2 = self._parameters.clone(prefix="moment2", init="zeros")  # v
        self.moment3 = self._parameters.clone(prefix="moment3", init="zeros")  # n
        self.prev_gradient = self._parameters.clone(prefix="prev_gradient", init="zeros")

        self.weight_decay = Tensor(weight_decay, mstype.float32)

    @ms_function
    def construct(self, gradients):
        params = self._parameters
        moment1 = self.moment1
        moment2 = self.moment2
        moment3 = self.moment3

        gradients = self.flatten_gradients(gradients)
        gradients = self.gradients_centralization(gradients)
        gradients = self.scale_grad(gradients)
        gradients = self._grad_sparse_indices_deduplicate(gradients)
        lr = self.get_lr()

        # TODO: currently not support dist
        success = self.map_(
            ops.partial(_adan_opt, self.beta1, self.beta2, self.beta3, self.eps, lr, self.weight_decay),
            params,
            moment1,
            moment2,
            moment3,
            gradients,
            self.prev_gradient,
        )

        return success

    @Optimizer.target.setter
    def target(self, value):
        """
        If the input value is set to "CPU", the parameters will be updated on the host using the Fused
        optimizer operation.
        """
        self._set_base_target(value)
mindocr.optim.adan.Adan.target(value)

If the input value is set to "CPU", the parameters will be updated on the host using the Fused optimizer operation.

Source code in mindocr\optim\adan.py
174
175
176
177
178
179
180
@Optimizer.target.setter
def target(self, value):
    """
    If the input value is set to "CPU", the parameters will be updated on the host using the Fused
    optimizer operation.
    """
    self._set_base_target(value)
mindocr.optim.lion
mindocr.optim.lion.Lion

Bases: Optimizer

Implementation of Lion optimizer from paper 'https://arxiv.org/abs/2302.06675'. Additionally, this implementation is with gradient clipping.

Notes: lr is usually 3-10x smaller than adamw. weight decay is usually 3-10x larger than adamw.

Source code in mindocr\optim\lion.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
class Lion(Optimizer):
    """
    Implementation of Lion optimizer from paper 'https://arxiv.org/abs/2302.06675'.
    Additionally, this implementation is with gradient clipping.

    Notes:
    lr is usually 3-10x smaller than adamw.
    weight decay is usually 3-10x larger than adamw.
    """

    @opt_init_args_register
    def __init__(
        self,
        params,
        learning_rate=2e-4,
        beta1=0.9,
        beta2=0.99,
        weight_decay=0.0,
        loss_scale=1.0,
        clip=False,
    ):
        super().__init__(learning_rate, params, weight_decay)
        _check_param_value(beta1, beta2, self.cls_name)
        self.beta1 = Tensor(np.array([beta1]).astype(np.float32))
        self.beta2 = Tensor(np.array([beta2]).astype(np.float32))
        self.moments1 = self.parameters.clone(prefix="lion_m", init="zeros")
        self.hyper_map = ops.HyperMap()
        self.beta1_power = Parameter(initializer(1, [1], ms.float32), name="beta1_power")
        self.beta2_power = Parameter(initializer(1, [1], ms.float32), name="beta2_power")

        self.reciprocal_scale = Tensor(1.0 / loss_scale, ms.float32)
        self.clip = clip

    def construct(self, gradients):
        lr = self.get_lr()
        gradients = scale_grad(gradients, self.reciprocal_scale)
        if self.clip:
            gradients = ops.clip_by_global_norm(gradients, 5.0, None)

        beta1_power = self.beta1_power * self.beta1
        self.beta1_power = beta1_power
        beta2_power = self.beta2_power * self.beta2
        self.beta2_power = beta2_power

        if self.is_group:
            if self.is_group_lr:
                optim_result = self.hyper_map(
                    ops.partial(_lion_opt, beta1_power, beta2_power, self.beta1, self.beta2),
                    lr,
                    self.weight_decay,
                    self.parameters,
                    self.moments1,
                    gradients,
                    self.decay_flags,
                    self.optim_filter,
                )
            else:
                optim_result = self.hyper_map(
                    ops.partial(_lion_opt, beta1_power, beta2_power, self.beta1, self.beta2, lr),
                    self.weight_decay,
                    self.parameters,
                    self.moments1,
                    gradients,
                    self.decay_flags,
                    self.optim_filter,
                )
        else:
            optim_result = self.hyper_map(
                ops.partial(_lion_opt, beta1_power, beta2_power, self.beta1, self.beta2, lr, self.weight_decay),
                self.parameters,
                self.moments1,
                gradients,
                self.decay_flags,
                self.optim_filter,
            )
        if self.use_parallel:
            self.broadcast_params(optim_result)
        return optim_result
mindocr.optim.lion.tensor_grad_scale(scale, grad)

Get grad with scale.

Source code in mindocr\optim\lion.py
28
29
30
31
32
33
@_grad_scale.register("Number", "Tensor")
def tensor_grad_scale(scale, grad):
    """Get grad with scale."""
    if scale == 1.0:
        return grad
    return ops.mul(grad, ops.cast(scale, grad.dtype))
mindocr.optim.lion.tensor_grad_scale_with_tensor(scale, grad)

Get grad with scale.

Source code in mindocr\optim\lion.py
36
37
38
39
@_grad_scale.register("Tensor", "Tensor")
def tensor_grad_scale_with_tensor(scale, grad):
    """Get grad with scale."""
    return ops.mul(grad, ops.cast(scale, grad.dtype))
mindocr.optim.nadam

nadam

mindocr.optim.nadam.NAdam

Bases: Optimizer

Implements NAdam algorithm (a variant of Adam based on Nesterov momentum).

Source code in mindocr\optim\nadam.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class NAdam(Optimizer):
    """
    Implements NAdam algorithm (a variant of Adam based on Nesterov momentum).
    """

    @opt_init_args_register
    def __init__(
        self,
        params,
        learning_rate=2e-3,
        beta1=0.9,
        beta2=0.999,
        eps=1e-8,
        weight_decay=0.0,
        loss_scale=1.0,
        schedule_decay=4e-3,
    ):
        super().__init__(learning_rate, params, weight_decay, loss_scale)
        _check_param_value(beta1, beta2, eps, self.cls_name)
        self.beta1 = Tensor(np.array([beta1]).astype(np.float32))
        self.beta2 = Tensor(np.array([beta2]).astype(np.float32))
        self.eps = Tensor(np.array([eps]).astype(np.float32))
        self.moments1 = self.parameters.clone(prefix="nadam_m", init="zeros")
        self.moments2 = self.parameters.clone(prefix="nadam_v", init="zeros")
        self.schedule_decay = Tensor(np.array([schedule_decay]).astype(np.float32))
        self.mu_schedule = Parameter(initializer(1, [1], ms.float32), name="mu_schedule")
        self.beta2_power = Parameter(initializer(1, [1], ms.float32), name="beta2_power")

    @ms_function
    def construct(self, gradients):
        lr = self.get_lr()
        params = self.parameters
        step = self.global_step + _scaler_one
        gradients = self.decay_weight(gradients)
        mu = self.beta1 * (
            _scaler_one - Tensor(0.5, ms.float32) * ops.pow(Tensor(0.96, ms.float32), step * self.schedule_decay)
        )
        mu_next = self.beta1 * (
            _scaler_one
            - Tensor(0.5, ms.float32) * ops.pow(Tensor(0.96, ms.float32), (step + _scaler_one) * self.schedule_decay)
        )
        mu_schedule = self.mu_schedule * mu
        mu_schedule_next = self.mu_schedule * mu * mu_next
        self.mu_schedule = mu_schedule
        beta2_power = self.beta2_power * self.beta2
        self.beta2_power = beta2_power

        num_params = len(params)
        for i in range(num_params):
            ops.assign(self.moments1[i], self.beta1 * self.moments1[i] + (_scaler_one - self.beta1) * gradients[i])
            ops.assign(
                self.moments2[i], self.beta2 * self.moments2[i] + (_scaler_one - self.beta2) * ops.square(gradients[i])
            )

            regulate_m = mu_next * self.moments1[i] / (_scaler_one - mu_schedule_next) + (_scaler_one - mu) * gradients[
                i
            ] / (_scaler_one - mu_schedule)
            regulate_v = self.moments2[i] / (_scaler_one - beta2_power)

            update = params[i] - lr * regulate_m / (self.eps + ops.sqrt(regulate_v))
            ops.assign(params[i], update)

        return params
mindocr.optim.optim_factory

optim factory

mindocr.optim.optim_factory.create_optimizer(params, opt='adam', lr=0.001, weight_decay=0, momentum=0.9, nesterov=False, filter_bias_and_bn=True, loss_scale=1.0, schedule_decay=0.004, checkpoint_path='', eps=1e-10, **kwargs)

Creates optimizer by name.

PARAMETER DESCRIPTION
params

network parameters. Union[list[Parameter],list[dict]], which must be the list of parameters or list of dicts. When the list element is a dictionary, the key of the dictionary can be "params", "lr", "weight_decay","grad_centralization" and "order_params".

opt

wrapped optimizer. You could choose like 'sgd', 'nesterov', 'momentum', 'adam', 'adamw', 'lion', 'rmsprop', 'adagrad', 'lamb'. 'adam' is the default choose for convolution-based networks. 'adamw' is recommended for ViT-based networks. Default: 'adam'.

TYPE: str DEFAULT: 'adam'

lr

learning rate: float or lr scheduler. Fixed and dynamic learning rate are supported. Default: 1e-3.

TYPE: Optional[float] DEFAULT: 0.001

weight_decay

weight decay factor. It should be noted that weight decay can be a constant value or a Cell. It is a Cell only when dynamic weight decay is applied. Dynamic weight decay is similar to dynamic learning rate, users need to customize a weight decay schedule only with global step as input, and during training, the optimizer calls the instance of WeightDecaySchedule to get the weight decay value of current step. Default: 0.

TYPE: float DEFAULT: 0

momentum

momentum if the optimizer supports. Default: 0.9.

TYPE: float DEFAULT: 0.9

nesterov

Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. Default: False.

TYPE: bool DEFAULT: False

filter_bias_and_bn

whether to filter batch norm parameters and bias from weight decay. If True, weight decay will not apply on BN parameters and bias in Conv or Dense layers. Default: True.

TYPE: bool DEFAULT: True

loss_scale

A floating point value for the loss scale, which must be larger than 0.0. Default: 1.0.

TYPE: float DEFAULT: 1.0

RETURNS DESCRIPTION

Optimizer object

Source code in mindocr\optim\optim_factory.py
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def create_optimizer(
    params,
    opt: str = "adam",
    lr: Optional[float] = 1e-3,
    weight_decay: float = 0,
    momentum: float = 0.9,
    nesterov: bool = False,
    filter_bias_and_bn: bool = True,
    loss_scale: float = 1.0,
    schedule_decay: float = 4e-3,
    checkpoint_path: str = "",
    eps: float = 1e-10,
    **kwargs,
):
    r"""Creates optimizer by name.

    Args:
        params: network parameters. Union[list[Parameter],list[dict]], which must be the list of parameters
            or list of dicts. When the list element is a dictionary, the key of the dictionary can be
            "params", "lr", "weight_decay","grad_centralization" and "order_params".
        opt: wrapped optimizer. You could choose like 'sgd', 'nesterov', 'momentum', 'adam', 'adamw', 'lion',
            'rmsprop', 'adagrad', 'lamb'. 'adam' is the default choose for convolution-based networks.
            'adamw' is recommended for ViT-based networks. Default: 'adam'.
        lr: learning rate: float or lr scheduler. Fixed and dynamic learning rate are supported. Default: 1e-3.
        weight_decay: weight decay factor. It should be noted that weight decay can be a constant value or a Cell.
            It is a Cell only when dynamic weight decay is applied. Dynamic weight decay is similar to
            dynamic learning rate, users need to customize a weight decay schedule only with global step as input,
            and during training, the optimizer calls the instance of WeightDecaySchedule to get the weight decay value
            of current step. Default: 0.
        momentum: momentum if the optimizer supports. Default: 0.9.
        nesterov: Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. Default: False.
        filter_bias_and_bn: whether to filter batch norm parameters and bias from weight decay.
            If True, weight decay will not apply on BN parameters and bias in Conv or Dense layers. Default: True.
        loss_scale: A floating point value for the loss scale, which must be larger than 0.0. Default: 1.0.

    Returns:
        Optimizer object
    """
    opt = opt.lower()

    if weight_decay and filter_bias_and_bn:
        if not isinstance(params[0], dict):  # check whether param grouping strategy is encoded in `params`
            params = init_group_params(params, weight_decay)
        else:
            print(
                "WARNING: Customized param grouping strategy detected in `params`. "
                "filter_bias_and_bn (default=True) will be disabled"
            )

    # opt_args = dict(**kwargs)
    # if lr is not None:
    #    opt_args.setdefault('lr', lr)

    assert (
        loss_scale == 1.0
    ), "loss scale must be 1.0 in optimizer due to gradients are already scaled previously in TrainStepWrapper."

    # non-adaptive: SGD, momentum, and nesterov
    if opt == "sgd":
        # note: nn.Momentum may perform better if momentum > 0.
        opt_args = _collect_args(kwargs, nn.SGD)

        optimizer = nn.SGD(
            params=params,
            learning_rate=lr,
            momentum=momentum,
            weight_decay=weight_decay,
            nesterov=nesterov,
            loss_scale=loss_scale,
            **opt_args,
        )
    elif opt in ["momentum", "nesterov"]:
        opt_args = _collect_args(kwargs, nn.Momentum)
        optimizer = nn.Momentum(
            params=params,
            learning_rate=lr,
            momentum=momentum,
            weight_decay=weight_decay,
            use_nesterov=nesterov,
            loss_scale=loss_scale,
        )
    # adaptive
    elif opt == "adam":
        opt_args = _collect_args(kwargs, nn.Adam)
        optimizer = nn.Adam(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            use_nesterov=nesterov,
            **opt_args,
        )
    elif opt == "adamw":
        opt_args = _collect_args(kwargs, AdamW)
        optimizer = AdamW(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            **opt_args,
        )
    elif opt == "lion":
        opt_args = _collect_args(kwargs, Lion)
        optimizer = Lion(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            **opt_args,
        )
    elif opt == "nadam":
        opt_args = _collect_args(kwargs, NAdam)
        optimizer = NAdam(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            schedule_decay=schedule_decay,
            **opt_args,
        )
    elif opt == "adan":
        opt_args = _collect_args(kwargs, Adan)
        optimizer = Adan(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            **opt_args,
        )
    elif opt == "rmsprop":
        opt_args = _collect_args(kwargs, nn.RMSProp)
        optimizer = nn.RMSProp(
            params=params,
            learning_rate=lr,
            momentum=momentum,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            epsilon=eps,
            **opt_args,
        )
    elif opt == "adagrad":
        opt_args = _collect_args(kwargs, nn.Adagrad)
        optimizer = nn.Adagrad(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            loss_scale=loss_scale,
            **opt_args,
        )
    elif opt == "lamb":
        assert loss_scale == 1.0, "Loss scaler is not supported by Lamb optimizer"
        opt_args = _collect_args(kwargs, nn.Lamb)
        optimizer = nn.Lamb(
            params=params,
            learning_rate=lr,
            weight_decay=weight_decay,
            **opt_args,
        )
    else:
        raise ValueError(f"Invalid optimizer: {opt}")

    if os.path.exists(checkpoint_path):
        param_dict = load_checkpoint(checkpoint_path)
        load_param_into_net(optimizer, param_dict)

    return optimizer
mindocr.optim.param_grouping

group parameters for setting different weight decay or learning rate for different layers in the network.

mindocr.optim.param_grouping.create_group_params(params, weight_decay=0, grouping_strategy=None, no_weight_decay_params=[], **kwargs)

create group parameters for setting different weight decay or learning rate for different layers in the network.

PARAMETER DESCRIPTION
params

network params

weight_decay

weight decay value

TYPE: float DEFAULT: 0

grouping_strategy

name of the hard-coded grouping strategy. If not None, group parameters according to the predefined function and no_weight_decay_params will not make effect.

TYPE: str DEFAULT: None

no_weight_decay_params

list of the param name substrings that will be picked to exclude from weight decay. If a parameter containing one of the substrings in the list, the paramter will not be applied with weigt decay. (Tips: param names can be checked by [p.name for p in network.trainable_params()]

TYPE: list DEFAULT: []

Return

list[dict], grouped parameters

Source code in mindocr\optim\param_grouping.py
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def create_group_params(params, weight_decay=0, grouping_strategy=None, no_weight_decay_params=[], **kwargs):
    """
    create group parameters for setting different weight decay or learning rate for different layers in the network.

    Args:
        params: network params
        weight_decay (float): weight decay value
        grouping_strategy (str): name of the hard-coded grouping strategy. If not None, group parameters according to
            the predefined function and `no_weight_decay_params` will not make effect.
        no_weight_decay_params (list): list of the param name substrings that will be picked to exclude from
            weight decay. If a parameter containing one of the substrings in the list, the paramter will not be applied
            with weigt decay. (Tips: param names can be checked by `[p.name for p in network.trainable_params()]`

    Return:
        list[dict], grouped parameters
    """

    # TODO: assert valid arg names
    gp = grouping_strategy

    # print(f'INFO: param grouping startegy: {grouping_strategy}, no_weight_decay_params: ', no_weight_decay_params)
    if gp is not None:
        if weight_decay == 0:
            print("WARNING: weight decay is 0 in param grouping, which is meaningless. Please check config setting.")
        if len(no_weight_decay_params) > 0:
            print(
                "WARNING: Both grouping_strategy and no_weight_decay_params are set, but grouping_strategy is of prior."
                " no_weight_decay_params={no_weight_decay_params} will not make effect."
            )

        if gp == "svtr":
            return grouping_svtr(params, weight_decay)
        elif gp == "filter_norm_and_bias":
            return grouping_default(params, weight_decay)
        else:
            raise ValueError(
                f"The grouping function for {gp} is not defined. "
                f"Valid grouping strategies are {supported_grouping_strategies}"
            )

    elif len(no_weight_decay_params) > 0:
        assert weight_decay > 0, f"Invalid weight decay value {weight_decay} for param grouping."
        decay_params = []
        no_decay_params = []
        for param in params:
            filter_param = False
            for k in no_weight_decay_params:
                if k in param.name:
                    filter_param = True

            if filter_param:
                no_decay_params.append(param)
            else:
                decay_params.append(param)

        return [
            {"params": decay_params, "weight_decay": weight_decay},
            {"params": no_decay_params},
            {"order_params": params},
        ]
    else:
        print("INFO: no parameter grouping is applied.")
        return params

mindocr.postprocess

mindocr.postprocess.build_postprocess(config)

Create postprocess function.

PARAMETER DESCRIPTION
config

configuration for postprocess including postprocess name and also the kwargs specifically

TYPE: dict

Return

Object

Example
Create postprocess function

from mindocr.postprocess import build_postprocess config = dict(name="RecCTCLabelDecode", use_space_char=False) postprocess = build_postprocess(config) postprocess

Source code in mindocr\postprocess\builder.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def build_postprocess(config: dict):
    """
    Create postprocess function.

    Args:
        config (dict): configuration for postprocess including postprocess `name` and also the kwargs specifically
        for each postprocessor.
            - name (str): metric function name, exactly the same as one of the supported postprocess class names

    Return:
        Object

    Example:
        >>> # Create postprocess function
        >>> from mindocr.postprocess import build_postprocess
        >>> config = dict(name="RecCTCLabelDecode", use_space_char=False)
        >>> postprocess = build_postprocess(config)
        >>> postprocess
    """
    proc = config.pop("name")
    if proc in supported_postprocess:
        postprocessor = eval(proc)(**config)
    elif proc is None:
        return None
    else:
        raise ValueError(f"Invalid postprocess name {proc}, support postprocess are {supported_postprocess}")

    return postprocessor
mindocr.postprocess.builder
mindocr.postprocess.builder.build_postprocess(config)

Create postprocess function.

PARAMETER DESCRIPTION
config

configuration for postprocess including postprocess name and also the kwargs specifically

TYPE: dict

Return

Object

Example
Create postprocess function

from mindocr.postprocess import build_postprocess config = dict(name="RecCTCLabelDecode", use_space_char=False) postprocess = build_postprocess(config) postprocess

Source code in mindocr\postprocess\builder.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def build_postprocess(config: dict):
    """
    Create postprocess function.

    Args:
        config (dict): configuration for postprocess including postprocess `name` and also the kwargs specifically
        for each postprocessor.
            - name (str): metric function name, exactly the same as one of the supported postprocess class names

    Return:
        Object

    Example:
        >>> # Create postprocess function
        >>> from mindocr.postprocess import build_postprocess
        >>> config = dict(name="RecCTCLabelDecode", use_space_char=False)
        >>> postprocess = build_postprocess(config)
        >>> postprocess
    """
    proc = config.pop("name")
    if proc in supported_postprocess:
        postprocessor = eval(proc)(**config)
    elif proc is None:
        return None
    else:
        raise ValueError(f"Invalid postprocess name {proc}, support postprocess are {supported_postprocess}")

    return postprocessor
mindocr.postprocess.cls_postprocess
mindocr.postprocess.cls_postprocess.ClsPostprocess

Bases: object

Map the predicted index back to orignal format (angle).

Source code in mindocr\postprocess\cls_postprocess.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class ClsPostprocess(object):
    """Map the predicted index back to orignal format (angle)."""

    def __init__(self, label_list=None, **kwargs):
        assert (
            label_list is not None
        ), "`label_list` should not be None. Please set it in 'postprocess' section in yaml config file."
        self.label_list = label_list

    def __call__(self, preds, **kwargs):
        if isinstance(preds, Tensor):
            preds = preds.asnumpy()

        pred_idxs = preds.argmax(axis=1)

        angles, scores = [], []
        for i, idx in enumerate(pred_idxs):
            angles.append(self.label_list[idx])
            scores.append(preds[i, idx])
        decode_preds = {"angles": angles, "scores": scores}

        return decode_preds
mindocr.postprocess.det_base_postprocess
mindocr.postprocess.det_base_postprocess.DetBasePostprocess

Base class for all text detection postprocessings.

PARAMETER DESCRIPTION
box_type

text region representation type after postprocessing, options: ['quad', 'poly']

TYPE: str DEFAULT: 'quad'

rescale_fields

names of fields to rescale back to the shape of the original image.

TYPE: list DEFAULT: ['polys']

Source code in mindocr\postprocess\det_base_postprocess.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
class DetBasePostprocess:
    """
    Base class for all text detection postprocessings.

    Args:
        box_type (str): text region representation type after postprocessing, options: ['quad', 'poly']
        rescale_fields (list): names of fields to rescale back to the shape of the original image.
    """

    def __init__(self, box_type="quad", rescale_fields: List[str] = ["polys"]):
        assert box_type in ["quad", "poly"], f"box_type must be `quad` or `poly`, but found {box_type}"

        self._rescale_fields = rescale_fields
        self.warned = False
        if self._rescale_fields is None:
            print("WARNING: `rescale_filed` is None. Cannot rescale the predicted polygons to original image space")

    def _postprocess(self, pred: Union[ms.Tensor, Tuple[ms.Tensor], np.ndarray], **kwargs) -> dict:
        """
        Postprocess network prediction to get text boxes on the transformed image space (which will be rescaled back to
        original image space in __call__ function)

        Args:
            pred: network prediction for input batch data, shape [batch_size, ...]

        Return:
            postprocessing result as a dict with keys:
                - polys (List[List[np.ndarray]): predicted polygons on the **transformed**
                (i.e. resized normally) image space, of shape (batch_size, num_polygons, num_points, 2).
                If `box_type` is 'quad', num_points=4.
                - scores (np.ndarray): confidence scores for the predicted polygons, shape (batch_size, num_polygons)

        Notes:
            - Please cast `pred` to the type you need in your implementation. Some postprocesssing steps use ops from
              mindspore.nn and prefer Tensor type, while some steps prefer np.ndarray type required in other libraries.
            - `_postprocess()` should **NOT round** the text box `polys` to integer in return, because they will be
              recaled and then rounded in the end. Rounding early will cause larger error in polygon rescaling and
              results in **evaluation performance degradation**, especially on small datasets.
        """
        raise NotImplementedError

    def __call__(
        self,
        pred: Union[ms.Tensor, Tuple[ms.Tensor], np.ndarray],
        shape_list: Union[np.ndarray, ms.Tensor] = None,
        **kwargs,
    ) -> dict:
        """
        Execution entry for postprocessing, which postprocess network prediction on the transformed image space to get
        text boxes and then rescale them back to the original image space.

        Args:
            pred (Union[Tensor, Tuple[Tensor], np.ndarray]): network prediction for input batch data,
                shape [batch_size, ...]
            shape_list (Union[np.ndarray, ms.Tensor]): shape and scale info for each image in the batch,
                shape [batch_size, 4]. Each internal array is [src_h, src_w, scale_h, scale_w],
                where src_h and src_w are height and width of the original image, and scale_h and scale_w
                are their scale ratio during image resizing.

        Returns:
            detection result as a dict with keys:
                - polys (List[List[np.ndarray]): predicted polygons mapped on the **original** image space,
                  shape [batch_size, num_polygons, num_points, 2]. If `box_type` is 'quad', num_points=4,
                  and the internal np.ndarray is of shape [4, 2]
                - scores (np.ndarray): confidence scores for the predicted polygons, shape (batch_size, num_polygons)
        """

        # 1. Check input type. Covert shape_list to np.ndarray
        if isinstance(shape_list, Tensor):
            shape_list = shape_list.asnumpy()

        if shape_list is not None:
            assert shape_list.shape[0] and shape_list.shape[1] == 4, (
                "The shape of each item in shape_list must be 4: [raw_img_h, raw_img_w, scale_h, scale_w]. "
                f"But got shape_list of shape {shape_list.shape}"
            )
        else:
            # shape_list = [[pred.shape[2], pred.shape[3], 1.0, 1.0] for i in range(pred.shape[0])] # H, W
            # shape_list = np.array(shape_list, dtype='float32')

            print(
                "WARNING: `shape_list` is None in postprocessing. Cannot rescale the prediction result to original "
                "image space, which can lead to inaccurate evaluation. You may add `shape_list` to `output_columns` "
                "list under eval section in yaml config file, or directly parse `shape_list` to postprocess callable "
                "function."
            )
            self.warned = True

        # 2. Core process
        result = self._postprocess(pred, **kwargs)

        # 3. Rescale processing results
        if shape_list is not None and self._rescale_fields is not None:
            result = self.rescale(result, shape_list)

        return result

    @staticmethod
    def _rescale_polygons(polygons: Union[List[np.ndarray], np.ndarray], shape_list: np.ndarray):
        """
        polygons (Union[List[np.ndarray], np.ndarray]): polygons for an image, shape [num_polygons, num_points, 2],
            value: xy coordinates for all polygon points
        shape_list (np.ndarray): shape and scale info for the image, shape [4,], value: [src_h, src_w, scale_h, scale_w]
        """
        scale = shape_list[:1:-1]
        size = shape_list[1::-1] - 1

        if isinstance(polygons, np.ndarray):
            polygons = np.clip(np.round(polygons / scale), 0, size)
        else:  # if polygons have different number of vertices and stored as a list
            polygons = [np.clip(np.round(poly / scale), 0, size) for poly in polygons]

        return polygons

    def rescale(self, result: Dict, shape_list: np.ndarray) -> dict:
        """
        rescale result back to original image shape

        Args:
            result (dict) with keys for the input data batch
                polys (np.ndarray): polygons for a batch of images, shape [batch_size, num_polygons, num_points, 2].
            shape_list (np.ndarray): image shape and scale info, shape [batch_size, 4]

        Return:
            rescaled result specified by rescale_field
        """

        for field in self._rescale_fields:
            assert (
                field in result
            ), f"Invalid field {field}. Found fields in intermediate postprocess result are {list(result.keys())}"

            for i, sample in enumerate(result[field]):
                if len(sample) > 0:
                    result[field][i] = self._rescale_polygons(sample, shape_list[i])

        return result
mindocr.postprocess.det_base_postprocess.DetBasePostprocess.__call__(pred, shape_list=None, **kwargs)

Execution entry for postprocessing, which postprocess network prediction on the transformed image space to get text boxes and then rescale them back to the original image space.

PARAMETER DESCRIPTION
pred

network prediction for input batch data, shape [batch_size, ...]

TYPE: Union[Tensor, Tuple[Tensor], np.ndarray]

shape_list

shape and scale info for each image in the batch, shape [batch_size, 4]. Each internal array is [src_h, src_w, scale_h, scale_w], where src_h and src_w are height and width of the original image, and scale_h and scale_w are their scale ratio during image resizing.

TYPE: Union[np.ndarray, ms.Tensor] DEFAULT: None

RETURNS DESCRIPTION
dict

detection result as a dict with keys: - polys (List[List[np.ndarray]): predicted polygons mapped on the original image space, shape [batch_size, num_polygons, num_points, 2]. If box_type is 'quad', num_points=4, and the internal np.ndarray is of shape [4, 2] - scores (np.ndarray): confidence scores for the predicted polygons, shape (batch_size, num_polygons)

Source code in mindocr\postprocess\det_base_postprocess.py
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def __call__(
    self,
    pred: Union[ms.Tensor, Tuple[ms.Tensor], np.ndarray],
    shape_list: Union[np.ndarray, ms.Tensor] = None,
    **kwargs,
) -> dict:
    """
    Execution entry for postprocessing, which postprocess network prediction on the transformed image space to get
    text boxes and then rescale them back to the original image space.

    Args:
        pred (Union[Tensor, Tuple[Tensor], np.ndarray]): network prediction for input batch data,
            shape [batch_size, ...]
        shape_list (Union[np.ndarray, ms.Tensor]): shape and scale info for each image in the batch,
            shape [batch_size, 4]. Each internal array is [src_h, src_w, scale_h, scale_w],
            where src_h and src_w are height and width of the original image, and scale_h and scale_w
            are their scale ratio during image resizing.

    Returns:
        detection result as a dict with keys:
            - polys (List[List[np.ndarray]): predicted polygons mapped on the **original** image space,
              shape [batch_size, num_polygons, num_points, 2]. If `box_type` is 'quad', num_points=4,
              and the internal np.ndarray is of shape [4, 2]
            - scores (np.ndarray): confidence scores for the predicted polygons, shape (batch_size, num_polygons)
    """

    # 1. Check input type. Covert shape_list to np.ndarray
    if isinstance(shape_list, Tensor):
        shape_list = shape_list.asnumpy()

    if shape_list is not None:
        assert shape_list.shape[0] and shape_list.shape[1] == 4, (
            "The shape of each item in shape_list must be 4: [raw_img_h, raw_img_w, scale_h, scale_w]. "
            f"But got shape_list of shape {shape_list.shape}"
        )
    else:
        # shape_list = [[pred.shape[2], pred.shape[3], 1.0, 1.0] for i in range(pred.shape[0])] # H, W
        # shape_list = np.array(shape_list, dtype='float32')

        print(
            "WARNING: `shape_list` is None in postprocessing. Cannot rescale the prediction result to original "
            "image space, which can lead to inaccurate evaluation. You may add `shape_list` to `output_columns` "
            "list under eval section in yaml config file, or directly parse `shape_list` to postprocess callable "
            "function."
        )
        self.warned = True

    # 2. Core process
    result = self._postprocess(pred, **kwargs)

    # 3. Rescale processing results
    if shape_list is not None and self._rescale_fields is not None:
        result = self.rescale(result, shape_list)

    return result
mindocr.postprocess.det_base_postprocess.DetBasePostprocess.rescale(result, shape_list)

rescale result back to original image shape

PARAMETER DESCRIPTION
shape_list

image shape and scale info, shape [batch_size, 4]

TYPE: np.ndarray

Return

rescaled result specified by rescale_field

Source code in mindocr\postprocess\det_base_postprocess.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
def rescale(self, result: Dict, shape_list: np.ndarray) -> dict:
    """
    rescale result back to original image shape

    Args:
        result (dict) with keys for the input data batch
            polys (np.ndarray): polygons for a batch of images, shape [batch_size, num_polygons, num_points, 2].
        shape_list (np.ndarray): image shape and scale info, shape [batch_size, 4]

    Return:
        rescaled result specified by rescale_field
    """

    for field in self._rescale_fields:
        assert (
            field in result
        ), f"Invalid field {field}. Found fields in intermediate postprocess result are {list(result.keys())}"

        for i, sample in enumerate(result[field]):
            if len(sample) > 0:
                result[field][i] = self._rescale_polygons(sample, shape_list[i])

    return result
mindocr.postprocess.det_db_postprocess
mindocr.postprocess.det_db_postprocess.DBPostprocess

Bases: DetBasePostprocess

DBNet & DBNet++ postprocessing pipeline: extracts polygons / rectangles from a binary map (heatmap) and returns their coordinates.

PARAMETER DESCRIPTION
binary_thresh

binarization threshold applied to the heatmap output of DBNet.

TYPE: float DEFAULT: 0.3

box_thresh

polygon confidence threshold. Polygons with scores lower than this threshold are filtered out.

TYPE: float DEFAULT: 0.7

max_candidates

maximum number of proposed polygons.

TYPE: int DEFAULT: 1000

expand_ratio

controls by how much polygons need to be expanded to recover the original text shape (DBNet predicts shrunken text masks).

TYPE: float DEFAULT: 1.5

box_type

output polygons ('polys') or rectangles ('quad') as the network's predictions.

DEFAULT: 'quad'

pred_name

heatmap's name used for polygons extraction.

TYPE: str DEFAULT: 'binary'

rescale_fields

name of fields to scale back to the shape of the original image.

TYPE: List[str] DEFAULT: ['polys']

Source code in mindocr\postprocess\det_db_postprocess.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
class DBPostprocess(DetBasePostprocess):
    """
    DBNet & DBNet++ postprocessing pipeline: extracts polygons / rectangles from a binary map (heatmap) and returns
        their coordinates.
    Args:
        binary_thresh: binarization threshold applied to the heatmap output of DBNet.
        box_thresh: polygon confidence threshold. Polygons with scores lower than this threshold are filtered out.
        max_candidates: maximum number of proposed polygons.
        expand_ratio: controls by how much polygons need to be expanded to recover the original text shape
            (DBNet predicts shrunken text masks).
        box_type: output polygons ('polys') or rectangles ('quad') as the network's predictions.
        pred_name: heatmap's name used for polygons extraction.
        rescale_fields: name of fields to scale back to the shape of the original image.
    """

    def __init__(
        self,
        binary_thresh: float = 0.3,
        box_thresh: float = 0.7,
        max_candidates: int = 1000,
        expand_ratio: float = 1.5,
        box_type="quad",
        pred_name: str = "binary",
        rescale_fields: List[str] = ["polys"],
    ):
        super().__init__(box_type, rescale_fields)

        self._min_size = 3
        self._binary_thresh = binary_thresh
        self._box_thresh = box_thresh
        self._max_candidates = max_candidates
        self._expand_ratio = expand_ratio
        self._out_poly = box_type == "poly"
        self._name = pred_name
        self._names = {"binary": 0, "thresh": 1, "thresh_binary": 2}

    def _postprocess(self, pred: Union[Tensor, Tuple[Tensor], np.ndarray], **kwargs) -> dict:
        """
        Postprocess network prediction to get text boxes on the transformed image space (which will be rescaled back to
        original image space in __call__ function)

        Args:
            pred (Union[Tensor, Tuple[Tensor], np.ndarray]): network prediction consists of
                binary: text region segmentation map, with shape (N, 1, H, W)
                thresh: [if exists] threshold prediction with shape (N, 1, H, W) (optional)
                thresh_binary: [if exists] binarized with threshold, (N, 1, H, W) (optional)

        Returns:
            postprocessing result as a dict with keys:
                - polys (List[np.ndarray]): predicted polygons on the **transformed** (i.e. resized normally) image
                space, of shape (batch_size, num_polygons, num_points, 2). If `box_type` is 'quad', num_points=4.
                - scores (np.ndarray): confidence scores for the predicted polygons, shape (batch_size, num_polygons)
        """
        if isinstance(pred, tuple):
            pred = pred[self._names[self._name]]
        if isinstance(pred, Tensor):
            pred = pred.asnumpy()
        if len(pred.shape) == 4 and pred.shape[1] != 1:  # pred shape (N, 3, H, W)
            pred = pred[:, :1, :, :]  # only need the first output
        if len(pred.shape) == 4:  # handle pred shape: (N, H, W) skip
            pred = pred.squeeze(1)

        segmentation = pred >= self._binary_thresh

        polys, scores = [], []
        for pr, segm in zip(pred, segmentation):
            sample_polys, sample_scores = self._extract_preds(pr, segm)
            polys.append(sample_polys)
            scores.append(sample_scores)

        return {"polys": polys, "scores": scores}

    def _extract_preds(self, pred: np.ndarray, bitmap: np.ndarray):
        outs = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
        if len(outs) == 3:  # FIXME: update to OpenCV 4.x and delete this
            _, contours, _ = outs[0], outs[1], outs[2]
        elif len(outs) == 2:
            contours, _ = outs[0], outs[1]

        polys, scores = [], []
        for contour in contours[: self._max_candidates]:
            contour = contour.squeeze(1)
            score = self._calc_score(pred, bitmap, contour)
            if score < self._box_thresh:
                continue

            if self._out_poly:
                epsilon = 0.005 * cv2.arcLength(contour, closed=True)
                points = cv2.approxPolyDP(contour, epsilon, closed=True).squeeze(1)
                if points.shape[0] < 4:
                    continue
            else:
                points, min_side = self._fit_box(contour)
                if min_side < self._min_size:
                    continue

            poly = Polygon(points)
            poly = np.array(expand_poly(points, distance=poly.area * self._expand_ratio / poly.length))
            if self._out_poly and len(poly) > 1:
                continue
            poly = poly.reshape(-1, 2)

            _box, min_side = self._fit_box(poly)
            if min_side < self._min_size + 2:
                continue
            if not self._out_poly:
                poly = _box

            # TODO: an alternative solution to avoid calling self._fit_box twice:
            # box = Polygon(points)
            # box = np.array(
            # expand_poly(points, distance=box.area * self._expand_ratio / box.length, joint_type=pyclipper.JT_MITER))
            # assert box.shape[0] == 4, print(f'box shape is {box.shape}')

            polys.append(poly)
            scores.append(score)

        if self._out_poly:
            return polys, scores
        return np.array(polys), np.array(scores).astype(np.float32)

    @staticmethod
    def _fit_box(contour):
        """
        Finds a minimum rotated rectangle enclosing the contour.
        """
        # box = cv2.minAreaRect(contour)  # returns center of a rect, size, and angle
        # # TODO: does the starting point really matter?
        # points = np.roll(cv2.boxPoints(box), -1, axis=0)  # extract box points from a rotated rectangle
        # return points, min(box[1])
        # box = cv2.minAreaRect(contour)  # returns center of a rect, size, and angle
        # # TODO: does the starting point really matter?
        # points = np.roll(cv2.boxPoints(box), -1, axis=0)  # extract box points from a rotated rectangle
        # return points, min(box[1])

        bounding_box = cv2.minAreaRect(contour)
        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])

        # index_1, index_2, index_3, index_4 = 0, 1, 2, 3
        if points[1][1] > points[0][1]:
            index_1 = 0
            index_4 = 1
        else:
            index_1 = 1
            index_4 = 0
        if points[3][1] > points[2][1]:
            index_2 = 2
            index_3 = 3
        else:
            index_2 = 3
            index_3 = 2

        box = [points[index_1], points[index_2], points[index_3], points[index_4]]
        return box, min(bounding_box[1])

    @staticmethod
    def _calc_score(pred, mask, contour):
        # calculates score (mean value) of a prediction inside a given contour.
        min_vals = np.clip(np.floor(np.min(contour, axis=0)), 0, np.array(pred.shape[::-1]) - 1).astype(np.int32)
        max_vals = np.clip(np.ceil(np.max(contour, axis=0)), 0, np.array(pred.shape[::-1]) - 1).astype(np.int32)
        return cv2.mean(
            pred[min_vals[1] : max_vals[1] + 1, min_vals[0] : max_vals[0] + 1],
            mask[min_vals[1] : max_vals[1] + 1, min_vals[0] : max_vals[0] + 1].astype(np.uint8),
        )[0]
mindocr.postprocess.det_east_postprocess
mindocr.postprocess.det_east_postprocess.EASTPostprocess

Bases: DetBasePostprocess

Source code in mindocr\postprocess\det_east_postprocess.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
class EASTPostprocess(DetBasePostprocess):
    def __init__(self, score_thresh=0.8, nms_thresh=0.2, box_type="quad", rescale_fields=["polys"]):
        super().__init__(box_type, rescale_fields)
        self._score_thresh = score_thresh
        self._nms_thresh = nms_thresh
        if rescale_fields is None:
            rescale_fields = []
        self._rescale_fields = rescale_fields

    def _postprocess(self, pred, **kwargs):
        """
        get boxes from feature map
        Input:
                pred (tuple) - (score, geo)
                    'score'       : score map from model <Tensor, (bs,1,row,col)>
                    'geo'         : geo map from model <Tensor, (bs,5,row,col)>
        Output:
                boxes       : dict of polys and scores {'polys': <numpy.ndarray, (bs,n,4,2)>, 'scores': numpy.ndarray,
                (bs,n,1)>)}
        """
        score, geo = pred
        if isinstance(score, Tensor):
            score = score.asnumpy()
        if isinstance(geo, Tensor):
            geo = geo.asnumpy()
        img_num = score.shape[0]
        polys_list = []
        scores_list = []
        for i in range(img_num):
            score, geo = score[i], geo[i]
            score = score[0, :, :]
            xy_text = np.argwhere(score > self._score_thresh)
            if xy_text.size == 0:
                polys = np.array([])
                scores = np.array([])
                polys_list.append(polys)
                scores_list.append(scores)
                continue

            xy_text = xy_text[np.argsort(xy_text[:, 0])]
            valid_pos = xy_text[:, ::-1].copy()  # n x 2, [x, y]
            valid_geo = geo[:, xy_text[:, 0], xy_text[:, 1]]  # 5 x n
            polys_restored, index = self._restore_polys(valid_pos, valid_geo, score.shape)

            if polys_restored.size == 0:
                polys = np.array([])
                scores = np.array([])
                polys_list.append(polys)
                scores_list.append(scores)
                continue

            boxes = np.zeros((polys_restored.shape[0], 9), dtype=np.float32)
            boxes[:, :8] = polys_restored
            boxes[:, 8] = score[xy_text[index, 0], xy_text[index, 1]]
            boxes = lanms.merge_quadrangle_n9(boxes.astype("float32"), self._nms_thresh)
            polys = boxes[:, :8].reshape(-1, 4, 2)
            scores = boxes[:, 8].reshape(-1, 1)
            polys_list.append(polys)
            scores_list.append(scores)
        return {"polys": np.array(polys_list), "scores": np.array(scores_list)}

    def _restore_polys(self, valid_pos, valid_geo, score_shape, scale=4):
        """
        restore polys from feature maps in given positions
        Input:
                valid_pos  : potential text positions <numpy.ndarray, (n,2)>
                valid_geo  : geometry in valid_pos <numpy.ndarray, (5,n)>
                score_shape: shape of score map
                scale      : image / feature map
        Output:
                restored polys <numpy.ndarray, (n,8)>, index
        """
        polys = []
        index = []
        valid_pos *= scale
        d = valid_geo[:4, :]  # 4 x N
        angle = valid_geo[4, :]  # N,

        for i in range(valid_pos.shape[0]):
            x = valid_pos[i, 0]
            y = valid_pos[i, 1]
            y_min = y - d[0, i]
            y_max = y + d[1, i]
            x_min = x - d[2, i]
            x_max = x + d[3, i]
            rotate_mat = self._get_rotate_mat(-angle[i])

            temp_x = np.array([[x_min, x_max, x_max, x_min]]) - x
            temp_y = np.array([[y_min, y_min, y_max, y_max]]) - y
            coordidates = np.concatenate((temp_x, temp_y), axis=0)
            res = np.dot(rotate_mat, coordidates)
            res[0, :] += x
            res[1, :] += y

            if self._is_valid_poly(res, score_shape, scale):
                index.append(i)
                polys.append([res[0, 0], res[1, 0], res[0, 1], res[1, 1], res[0, 2], res[1, 2], res[0, 3], res[1, 3]])
        return np.array(polys), index

    def _get_rotate_mat(self, theta):
        """positive theta value means rotate clockwise"""
        return np.array([[math.cos(theta), -math.sin(theta)], [math.sin(theta), math.cos(theta)]])

    def _is_valid_poly(self, res, score_shape, scale):
        """
        check if the poly in image scope
        Input:
                res        : restored poly in original image
                score_shape: score map shape
                scale      : feature map -> image
        Output:
                True if valid
        """
        cnt = 0
        for i in range(res.shape[1]):
            if (
                res[0, i] < 0
                or res[0, i] >= score_shape[1] * scale
                or res[1, i] < 0
                or res[1, i] >= score_shape[0] * scale
            ):
                cnt += 1
        return cnt <= 1
mindocr.postprocess.det_pse_postprocess
mindocr.postprocess.det_pse_postprocess.PSEPostprocess

Bases: DetBasePostprocess

Source code in mindocr\postprocess\det_pse_postprocess.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
class PSEPostprocess(DetBasePostprocess):
    def __init__(
        self,
        binary_thresh=0.5,
        box_thresh=0.85,
        min_area=16,
        box_type="quad",
        scale=4,
        output_score_kernels=False,
        rescale_fields=["polys"],
    ):
        super().__init__(box_type, rescale_fields)

        from .pse import pse

        self._binary_thresh = binary_thresh
        self._box_thresh = box_thresh
        self._min_area = min_area
        self._box_type = box_type
        self._scale = scale
        self._interpolate = nn.ResizeBilinear()
        self._sigmoid = nn.Sigmoid()
        if rescale_fields is None:
            rescale_fields = []
        self._rescale_fields = rescale_fields
        self._pse = pse
        self._output_score_kernels = output_score_kernels

    def _postprocess(self, pred, **kwargs):  # pred: N 7 H W
        """
        Args:
            pred (Tensor): network prediction with shape [BS, C, H, W]
        """
        score, kernels = None, None
        if self._output_score_kernels:
            score = pred[0]
            kernels = pred[1].astype(np.uint8)
        else:
            if isinstance(pred, tuple):  # used when inference, only need the first output
                pred = pred[0]
            if not isinstance(pred, Tensor):
                pred = Tensor(pred)

            pred = self._interpolate(pred, scale_factor=4 // self._scale)
            score = self._sigmoid(pred[:, 0, :, :])

            kernels = (pred > self._binary_thresh).astype(ms.float32)
            text_mask = kernels[:, :1, :, :]
            text_mask = text_mask.astype(ms.int8)

            kernels[:, 1:, :, :] = kernels[:, 1:, :, :] * text_mask
            score = score.asnumpy()
            kernels = kernels.asnumpy().astype(np.uint8)
        poly_list, score_list = [], []
        for batch_idx in range(score.shape[0]):
            boxes, scores = self._boxes_from_bitmap(score[batch_idx], kernels[batch_idx])
            poly_list.append(boxes)
            score_list.append(scores)

        return {"polys": poly_list, "scores": score_list}

    def _boxes_from_bitmap(self, score, kernels):
        label = self._pse(kernels, self._min_area)
        return self._generate_box(score, label)

    def _generate_box(self, score, label):
        label_num = np.max(label) + 1
        boxes = []
        scores = []
        for i in range(1, label_num):
            ind = label == i
            points = np.array(np.where(ind)).transpose((1, 0))[:, ::-1]
            if points.shape[0] < self._min_area:
                label[ind] = 0
                continue

            score_i = np.mean(score[ind])
            if score_i < self._box_thresh:
                label[ind] = 0
                continue

            if self._box_type == "quad":
                rect = cv2.minAreaRect(points)
                bbox = cv2.boxPoints(rect)
            elif self._box_type == "poly":
                box_height = np.max(points[:, 1]) + 10
                box_width = np.max(points[:, 0]) + 10
                mask = np.zeros((box_height, box_width), np.uint8)
                mask[points[:, 1], points[:, 0]] = 255
                contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
                bbox = np.squeeze(contours[0], 1)
            else:
                raise NotImplementedError(
                    f"The value of param 'box_type' can only be 'quad', but got '{self._box_type}'."
                )
            boxes.append(bbox)
            scores.append(score_i)

        return boxes, scores
mindocr.postprocess.rec_postprocess
mindocr.postprocess.rec_postprocess.RecAttnLabelDecode
Source code in mindocr\postprocess\rec_postprocess.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
class RecAttnLabelDecode:
    def __init__(
        self, character_dict_path: Optional[str] = None, use_space_char: bool = False, lower: bool = False
    ) -> None:
        """
        Convert text label (str) to a sequence of character indices according to the char dictionary

        Args:
            character_dict_path: path to dictionary, if None, a dictionary containing 36 chars
                (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
            use_space_char(bool): if True, add space char to the dict to recognize the space in between two words
            lower (bool): if True, all upper-case chars in the label text will be converted to lower case.
                Set to be True if dictionary only contains lower-case chars.
                Set to be False if not and want to recognition both upper-case and lower-case.

        Attributes:
            go_idx: the index of the GO token
            stop_idx: the index of the STOP token
            num_valid_chars: the number of valid characters (including space char if used) in the dictionary
            num_classes: the number of classes (which valid characters char and the speical token for blank padding).
                so num_classes = num_valid_chars + 1
        """
        self.lower = lower

        # read dict
        if character_dict_path is None:
            char_list = list("0123456789abcdefghijklmnopqrstuvwxyz")

            self.lower = True
            print("INFO: The character_dict_path is None, model can only recognize number and lower letters")
        else:
            # parse char dictionary
            char_list = []
            with open(character_dict_path, "r") as f:
                for line in f:
                    c = line.rstrip("\n\r")
                    char_list.append(c)

        # add space char if set
        if use_space_char:
            if " " not in char_list:
                char_list.append(" ")
            self.space_idx = len(char_list) + 1
        else:
            if " " in char_list:
                print(
                    "WARNING: The dict still contains space char in dict although use_space_char is set to be False, "
                    "because the space char is coded in the dictionary file ",
                    character_dict_path,
                )

        self.num_valid_chars = len(char_list)  # the number of valid chars (including space char if used)

        special_token = ["<GO>", "<STOP>"]
        char_list = special_token + char_list

        self.go_idx = 0
        self.stop_idx = 1

        self.character = {idx: c for idx, c in enumerate(char_list)}

        self.num_classes = len(self.character)

    def decode(self, char_indices: np.ndarray, probs: np.ndarray) -> Tuple[List[str], List[float]]:
        texts = list()
        confs = list()

        batch_size = len(char_indices)
        for batch_idx in range(batch_size):
            char_list = [self.character[i] for i in char_indices[batch_idx]]

            try:
                pred_EOS = char_list.index("<STOP>")
            except ValueError:
                pred_EOS = -1

            if self.lower:
                char_list = [x.lower() for x in char_list]

            if pred_EOS != -1:
                char_list = char_list[:pred_EOS]
                text = "".join(char_list)
            else:
                text = ""

            if probs is not None and pred_EOS != -1:
                conf_list = probs[batch_idx][:pred_EOS]
            else:
                conf_list = [0]

            texts.append(text)
            confs.append(np.mean(conf_list))
        return texts, confs

    def __call__(self, preds: Union[Tensor, np.ndarray], labels=None, **kwargs) -> Dict[str, Any]:
        """
        Args:
            preds (dict or tuple): containing prediction tensor in shape [BS, W, num_classes]
        Return:
            texts (List[Tuple]): list of string
        """
        if isinstance(preds, tuple):
            preds = preds[-1]

        if isinstance(preds, Tensor):
            preds = preds.asnumpy()

        pred_indices = preds.argmax(axis=-1)
        pred_probs = preds.max(axis=-1)

        raw_chars = [[self.character[idx] for idx in pred_indices[b]] for b in range(pred_indices.shape[0])]

        texts, confs = self.decode(pred_indices, pred_probs)

        return {"texts": texts, "confs": confs, "raw_chars": raw_chars}
mindocr.postprocess.rec_postprocess.RecAttnLabelDecode.__call__(preds, labels=None, **kwargs)
PARAMETER DESCRIPTION
preds

containing prediction tensor in shape [BS, W, num_classes]

TYPE: dict or tuple

Return

texts (List[Tuple]): list of string

Source code in mindocr\postprocess\rec_postprocess.py
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def __call__(self, preds: Union[Tensor, np.ndarray], labels=None, **kwargs) -> Dict[str, Any]:
    """
    Args:
        preds (dict or tuple): containing prediction tensor in shape [BS, W, num_classes]
    Return:
        texts (List[Tuple]): list of string
    """
    if isinstance(preds, tuple):
        preds = preds[-1]

    if isinstance(preds, Tensor):
        preds = preds.asnumpy()

    pred_indices = preds.argmax(axis=-1)
    pred_probs = preds.max(axis=-1)

    raw_chars = [[self.character[idx] for idx in pred_indices[b]] for b in range(pred_indices.shape[0])]

    texts, confs = self.decode(pred_indices, pred_probs)

    return {"texts": texts, "confs": confs, "raw_chars": raw_chars}
mindocr.postprocess.rec_postprocess.RecAttnLabelDecode.__init__(character_dict_path=None, use_space_char=False, lower=False)

Convert text label (str) to a sequence of character indices according to the char dictionary

PARAMETER DESCRIPTION
character_dict_path

path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.

TYPE: Optional[str] DEFAULT: None

use_space_char(bool)

if True, add space char to the dict to recognize the space in between two words

lower

if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.

TYPE: bool DEFAULT: False

ATTRIBUTE DESCRIPTION
go_idx

the index of the GO token

stop_idx

the index of the STOP token

num_valid_chars

the number of valid characters (including space char if used) in the dictionary

num_classes

the number of classes (which valid characters char and the speical token for blank padding). so num_classes = num_valid_chars + 1

Source code in mindocr\postprocess\rec_postprocess.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
def __init__(
    self, character_dict_path: Optional[str] = None, use_space_char: bool = False, lower: bool = False
) -> None:
    """
    Convert text label (str) to a sequence of character indices according to the char dictionary

    Args:
        character_dict_path: path to dictionary, if None, a dictionary containing 36 chars
            (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
        use_space_char(bool): if True, add space char to the dict to recognize the space in between two words
        lower (bool): if True, all upper-case chars in the label text will be converted to lower case.
            Set to be True if dictionary only contains lower-case chars.
            Set to be False if not and want to recognition both upper-case and lower-case.

    Attributes:
        go_idx: the index of the GO token
        stop_idx: the index of the STOP token
        num_valid_chars: the number of valid characters (including space char if used) in the dictionary
        num_classes: the number of classes (which valid characters char and the speical token for blank padding).
            so num_classes = num_valid_chars + 1
    """
    self.lower = lower

    # read dict
    if character_dict_path is None:
        char_list = list("0123456789abcdefghijklmnopqrstuvwxyz")

        self.lower = True
        print("INFO: The character_dict_path is None, model can only recognize number and lower letters")
    else:
        # parse char dictionary
        char_list = []
        with open(character_dict_path, "r") as f:
            for line in f:
                c = line.rstrip("\n\r")
                char_list.append(c)

    # add space char if set
    if use_space_char:
        if " " not in char_list:
            char_list.append(" ")
        self.space_idx = len(char_list) + 1
    else:
        if " " in char_list:
            print(
                "WARNING: The dict still contains space char in dict although use_space_char is set to be False, "
                "because the space char is coded in the dictionary file ",
                character_dict_path,
            )

    self.num_valid_chars = len(char_list)  # the number of valid chars (including space char if used)

    special_token = ["<GO>", "<STOP>"]
    char_list = special_token + char_list

    self.go_idx = 0
    self.stop_idx = 1

    self.character = {idx: c for idx, c in enumerate(char_list)}

    self.num_classes = len(self.character)
mindocr.postprocess.rec_postprocess.RecCTCLabelDecode

Bases: object

Convert text label (str) to a sequence of character indices according to the char dictionary

PARAMETER DESCRIPTION
character_dict_path

path to dictionary, if None, a dictionary containing 36 chars (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.

DEFAULT: None

use_space_char(bool)

if True, add space char to the dict to recognize the space in between two words

blank_at_last(bool)

padding with blank index (not the space index). If True, a blank/padding token will be appended to the end of the dictionary, so that blank_index = num_chars, where num_chars is the number of character in the dictionary including space char if used. If False, blank token will be inserted in the beginning of the dictionary, so blank_index=0.

lower

if True, all upper-case chars in the label text will be converted to lower case. Set to be True if dictionary only contains lower-case chars. Set to be False if not and want to recognition both upper-case and lower-case.

TYPE: bool DEFAULT: False

ATTRIBUTE DESCRIPTION
blank_idx

the index of the blank token for padding

num_valid_chars

the number of valid characters (including space char if used) in the dictionary

num_classes

the number of classes (which valid characters char and the speical token for blank padding). so num_classes = num_valid_chars + 1

Source code in mindocr\postprocess\rec_postprocess.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
class RecCTCLabelDecode(object):
    """Convert text label (str) to a sequence of character indices according to the char dictionary

    Args:
        character_dict_path: path to dictionary, if None, a dictionary containing 36 chars
            (i.e., "0123456789abcdefghijklmnopqrstuvwxyz") will be used.
        use_space_char(bool): if True, add space char to the dict to recognize the space in between two words
        blank_at_last(bool): padding with blank index (not the space index).
            If True, a blank/padding token will be appended to the end of the dictionary, so that
            blank_index = num_chars, where num_chars is the number of character in the dictionary including space char
            if used. If False, blank token will be inserted in the beginning of the dictionary, so blank_index=0.
        lower (bool): if True, all upper-case chars in the label text will be converted to lower case.
            Set to be True if dictionary only contains lower-case chars.
            Set to be False if not and want to recognition both upper-case and lower-case.

    Attributes:
        blank_idx: the index of the blank token for padding
        num_valid_chars: the number of valid characters (including space char if used) in the dictionary
        num_classes: the number of classes (which valid characters char and the speical token for blank padding).
            so num_classes = num_valid_chars + 1


    """

    def __init__(
        self,
        character_dict_path=None,
        use_space_char=False,
        blank_at_last=True,
        lower=False,
    ):
        self.space_idx = None
        self.lower = lower

        # read dict
        if character_dict_path is None:
            char_list = [c for c in "0123456789abcdefghijklmnopqrstuvwxyz"]
            self.lower = True
            print(
                "INFO: `character_dict_path` for RecCTCLabelDecode is not given. "
                'Default dict "0123456789abcdefghijklmnopqrstuvwxyz" is applied. Only number and English letters '
                "(regardless of lower/upper case) will be recognized and evaluated."
            )
        else:
            # parse char dictionary
            char_list = []
            with open(character_dict_path, "r") as f:
                for line in f:
                    c = line.rstrip("\n\r")
                    char_list.append(c)
        # add space char if set
        if use_space_char:
            if " " not in char_list:
                char_list.append(" ")
            self.space_idx = len(char_list) - 1
        else:
            if " " in char_list:
                print(
                    "WARNING: The dict still contains space char in dict although use_space_char is set to be False, "
                    "because the space char is coded in the dictionary file ",
                    character_dict_path,
                )

        self.num_valid_chars = len(char_list)  # the number of valid chars (including space char if used)

        # add blank token for padding
        if blank_at_last:
            # the index of a char in dict is [0, num_chars-1], blank index is set to num_chars
            char_list.append("<PAD>")
            self.blank_idx = self.num_valid_chars
        else:
            char_list = ["<PAD>"] + char_list
            self.blank_idx = 0

        self.ignore_indices = [self.blank_idx]

        self.character = {idx: c for idx, c in enumerate(char_list)}

        self.num_classes = len(self.character)

    def decode(self, char_indices, prob=None, remove_duplicate=False):
        """
        Convert to a squence of char indices to text string
        Args:
            char_indices (np.ndarray): in shape [BS, W]
        Returns:
            text
        """

        """ convert text-index into text-label. """
        texts = []
        confs = []
        batch_size = len(char_indices)
        for batch_idx in range(batch_size):
            selection = np.ones(len(char_indices[batch_idx]), dtype=bool)
            if remove_duplicate:
                selection[1:] = char_indices[batch_idx][1:] != char_indices[batch_idx][:-1]
            for ignored_token in self.ignore_indices:
                selection &= char_indices[batch_idx] != ignored_token

            char_list = [self.character[text_id] for text_id in char_indices[batch_idx][selection]]
            if prob is not None:
                conf_list = prob[batch_idx][selection]
            else:
                conf_list = [1] * len(selection)
            if len(conf_list) == 0:
                conf_list = [0]

            if self.lower:
                char_list = [x.lower() for x in char_list]

            text = "".join(char_list)

            # result_list.append((text, np.mean(conf_list).tolist()))
            texts.append(text)
            confs.append(np.mean(conf_list))
        return texts, confs

    def __call__(self, preds: Union[Tensor, np.ndarray], labels=None, **kwargs):
        """
        Args:
            preds (Union[Tensor, np.ndarray]): network prediction, class probabilities in shape [BS, W, num_classes],
                where W is the sequence length.
            labels: optional
        Return:
            texts (List[Tuple]): list of string

        """
        if isinstance(preds, tuple):
            preds = preds[-1]

        if isinstance(preds, Tensor):
            preds = preds.asnumpy()

        # preds = preds.transpose([1, 0, 2]) # [W, BS, C] -> [BS, W, C]. already did in model head.
        pred_indices = preds.argmax(axis=-1)
        pred_prob = preds.max(axis=-1)

        # print('pred indices: ', pred_indices)
        # print('pred prob: ', pred_prob.shape)

        # TODO: for debug only
        raw_chars = [[self.character[idx] for idx in pred_indices[b]] for b in range(pred_indices.shape[0])]

        texts, confs = self.decode(pred_indices, pred_prob, remove_duplicate=True)

        return {"texts": texts, "confs": confs, "raw_chars": raw_chars}
mindocr.postprocess.rec_postprocess.RecCTCLabelDecode.__call__(preds, labels=None, **kwargs)
PARAMETER DESCRIPTION
preds

network prediction, class probabilities in shape [BS, W, num_classes], where W is the sequence length.

TYPE: Union[Tensor, np.ndarray]

labels

optional

DEFAULT: None

Return

texts (List[Tuple]): list of string

Source code in mindocr\postprocess\rec_postprocess.py
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def __call__(self, preds: Union[Tensor, np.ndarray], labels=None, **kwargs):
    """
    Args:
        preds (Union[Tensor, np.ndarray]): network prediction, class probabilities in shape [BS, W, num_classes],
            where W is the sequence length.
        labels: optional
    Return:
        texts (List[Tuple]): list of string

    """
    if isinstance(preds, tuple):
        preds = preds[-1]

    if isinstance(preds, Tensor):
        preds = preds.asnumpy()

    # preds = preds.transpose([1, 0, 2]) # [W, BS, C] -> [BS, W, C]. already did in model head.
    pred_indices = preds.argmax(axis=-1)
    pred_prob = preds.max(axis=-1)

    # print('pred indices: ', pred_indices)
    # print('pred prob: ', pred_prob.shape)

    # TODO: for debug only
    raw_chars = [[self.character[idx] for idx in pred_indices[b]] for b in range(pred_indices.shape[0])]

    texts, confs = self.decode(pred_indices, pred_prob, remove_duplicate=True)

    return {"texts": texts, "confs": confs, "raw_chars": raw_chars}
mindocr.postprocess.rec_postprocess.RecCTCLabelDecode.decode(char_indices, prob=None, remove_duplicate=False)

Convert to a squence of char indices to text string

PARAMETER DESCRIPTION
char_indices

in shape [BS, W]

TYPE: np.ndarray

RETURNS DESCRIPTION

text

Source code in mindocr\postprocess\rec_postprocess.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def decode(self, char_indices, prob=None, remove_duplicate=False):
    """
    Convert to a squence of char indices to text string
    Args:
        char_indices (np.ndarray): in shape [BS, W]
    Returns:
        text
    """

    """ convert text-index into text-label. """
    texts = []
    confs = []
    batch_size = len(char_indices)
    for batch_idx in range(batch_size):
        selection = np.ones(len(char_indices[batch_idx]), dtype=bool)
        if remove_duplicate:
            selection[1:] = char_indices[batch_idx][1:] != char_indices[batch_idx][:-1]
        for ignored_token in self.ignore_indices:
            selection &= char_indices[batch_idx] != ignored_token

        char_list = [self.character[text_id] for text_id in char_indices[batch_idx][selection]]
        if prob is not None:
            conf_list = prob[batch_idx][selection]
        else:
            conf_list = [1] * len(selection)
        if len(conf_list) == 0:
            conf_list = [0]

        if self.lower:
            char_list = [x.lower() for x in char_list]

        text = "".join(char_list)

        # result_list.append((text, np.mean(conf_list).tolist()))
        texts.append(text)
        confs.append(np.mean(conf_list))
    return texts, confs

mindocr.scheduler

Learning Rate Scheduler

mindocr.scheduler.dynamic_lr

Meta learning rate scheduler.

This module implements exactly the same learning rate scheduler as native PyTorch, see "torch.optim.lr_scheduler" <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate>_. At present, only constant_lr, linear_lr, polynomial_lr, exponential_lr, step_lr, multi_step_lr, cosine_annealing_lr, cosine_annealing_warm_restarts_lr are implemented. The number, name and usage of the Positional Arguments are exactly the same as those of native PyTorch.

However, due to the constraint of having to explicitly return the learning rate at each step, we have to introduce additional Keyword Arguments. There are only three Keyword Arguments introduced, namely lr, steps_per_epoch and epochs, explained as follows: lr: the basic learning rate when creating optim in torch. steps_per_epoch: the number of steps(iterations) of each epoch. epochs: the number of epoch. It and steps_per_epoch determine the length of the returned lrs.

Since most scheduler in PyTorch are coarse-grained, that is the learning rate is constant within a single epoch. For non-stepwise scheduler, we introduce several fine-grained variation, that is the learning rate is also changed within a single epoch. The function name of these variants have the refined keyword. The implemented fine-grained variation are list as follows: linear_refined_lr, polynomial_refined_lr, etc.

mindocr.scheduler.dynamic_lr.cosine_decay_lr(decay_epochs, eta_min, *, eta_max, steps_per_epoch, epochs, num_cycles=1, cycle_decay=1.0)

update every epoch

Source code in mindocr\scheduler\dynamic_lr.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def cosine_decay_lr(decay_epochs, eta_min, *, eta_max, steps_per_epoch, epochs, num_cycles=1, cycle_decay=1.0):
    """update every epoch"""
    tot_steps = steps_per_epoch * epochs
    lrs = []

    for c in range(num_cycles):
        lr_max = eta_max * (cycle_decay**c)
        delta = 0.5 * (lr_max - eta_min)
        for i in range(steps_per_epoch * decay_epochs):
            t_cur = math.floor(i / steps_per_epoch)
            t_cur = min(t_cur, decay_epochs)
            lr_cur = eta_min + delta * (1.0 + math.cos(math.pi * t_cur / decay_epochs))
            if len(lrs) < tot_steps:
                lrs.append(lr_cur)
            else:
                break

    if epochs > num_cycles * decay_epochs:
        for i in range((epochs - (num_cycles * decay_epochs)) * steps_per_epoch):
            lrs.append(eta_min)

    return lrs
mindocr.scheduler.dynamic_lr.cosine_decay_refined_lr(decay_epochs, eta_min, *, eta_max, steps_per_epoch, epochs, num_cycles=1, cycle_decay=1.0)

update every step

Source code in mindocr\scheduler\dynamic_lr.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def cosine_decay_refined_lr(decay_epochs, eta_min, *, eta_max, steps_per_epoch, epochs, num_cycles=1, cycle_decay=1.0):
    """update every step"""
    tot_steps = steps_per_epoch * epochs
    lrs = []

    for c in range(num_cycles):
        lr_max = eta_max * (cycle_decay**c)
        delta = 0.5 * (lr_max - eta_min)
        for i in range(steps_per_epoch * decay_epochs):
            t_cur = i / steps_per_epoch
            t_cur = min(t_cur, decay_epochs)
            lr_cur = eta_min + delta * (1.0 + math.cos(math.pi * t_cur / decay_epochs))
            if len(lrs) < tot_steps:
                lrs.append(lr_cur)
            else:
                break

    if epochs > num_cycles * decay_epochs:
        for i in range((epochs - (num_cycles * decay_epochs)) * steps_per_epoch):
            lrs.append(eta_min)

    return lrs
mindocr.scheduler.multi_step_decay_lr

MultiStep Decay Learning Rate Scheduler

mindocr.scheduler.multi_step_decay_lr.MultiStepDecayLR

Bases: LearningRateSchedule

Multiple step learning rate The learning rate will decay once the number of step reaches one of the milestones.

Source code in mindocr\scheduler\multi_step_decay_lr.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class MultiStepDecayLR(LearningRateSchedule):
    """Multiple step learning rate
    The learning rate will decay once the number of step reaches one of the milestones.
    """

    def __init__(self, lr, warmup_epochs, decay_rate, milestones, steps_per_epoch, num_epochs):
        super().__init__()
        self.warmup_steps = warmup_epochs * steps_per_epoch
        num_steps = num_epochs * steps_per_epoch
        step_lrs = []
        cur_lr = lr
        k = 0
        for step in range(num_steps):
            if step == milestones[k] * steps_per_epoch:
                cur_lr = cur_lr * decay_rate
                k = min(k + 1, len(milestones) - 1)
            step_lrs.append(cur_lr)
        if self.warmup_steps > 0:
            self.warmup_lr = nn.WarmUpLR(lr, self.warmup_steps)
        self.step_lrs = ms.Tensor(step_lrs, ms.float32)

    def construct(self, global_step):
        if self.warmup_steps > 0 and global_step < self.warmup_steps:
            lr = self.warmup_lr(global_step)
        elif global_step < self.step_lrs.shape[0]:
            lr = self.step_lrs[global_step]
        else:
            lr = self.step_lrs[-1]
        return lr
mindocr.scheduler.scheduler_factory

Scheduler Factory

mindocr.scheduler.scheduler_factory.create_scheduler(steps_per_epoch, scheduler='constant', lr=0.01, min_lr=1e-06, warmup_epochs=3, warmup_factor=0.0, decay_epochs=10, decay_rate=0.9, milestones=None, num_epochs=200, num_cycles=1, cycle_decay=1.0, lr_epoch_stair=False)

Creates learning rate scheduler by name.

PARAMETER DESCRIPTION
steps_per_epoch

number of steps per epoch.

TYPE: int

scheduler

scheduler name like 'constant', 'cosine_decay', 'step_decay', 'exponential_decay', 'polynomial_decay', 'multi_step_decay'. Default: 'constant'.

TYPE: str DEFAULT: 'constant'

lr

learning rate value. Default: 0.01.

TYPE: float DEFAULT: 0.01

min_lr

lower lr bound for 'cosine_decay' schedulers. Default: 1e-6.

TYPE: float DEFAULT: 1e-06

warmup_epochs

epochs to warmup LR, if scheduler supports. Default: 3.

TYPE: int DEFAULT: 3

warmup_factor

the warmup phase of scheduler is a linearly increasing lr, the beginning factor is warmup_factor, i.e., the lr of the first step/epoch is lr*warmup_factor, and the ending lr in the warmup phase is lr. Default: 0.0

TYPE: float DEFAULT: 0.0

decay_epochs

for 'cosine_decay' schedulers, decay LR to min_lr in decay_epochs. For 'step_decay' scheduler, decay LR by a factor of decay_rate every decay_epochs. Default: 10.

TYPE: int DEFAULT: 10

decay_rate

LR decay rate (default: 0.9)

TYPE: float DEFAULT: 0.9

milestones

list of epoch milestones for 'multi_step_decay' scheduler. Must be increasing.

TYPE: list DEFAULT: None

num_epochs

number of total epochs.

TYPE: int DEFAULT: 200

lr_epoch_stair

If True, LR will be updated in the beginning of each new epoch and the LR will be consistent for each batch in one epoch. Otherwise, learning rate will be updated dynamically in each step. (default=False)

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

Cell object for computing LR with input of current global steps

Source code in mindocr\scheduler\scheduler_factory.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def create_scheduler(
    steps_per_epoch: int,
    scheduler: str = "constant",
    lr: float = 0.01,
    min_lr: float = 1e-6,
    warmup_epochs: int = 3,
    warmup_factor: float = 0.0,
    decay_epochs: int = 10,
    decay_rate: float = 0.9,
    milestones: list = None,
    num_epochs: int = 200,
    num_cycles: int = 1,
    cycle_decay: float = 1.0,
    lr_epoch_stair: bool = False,
):
    r"""Creates learning rate scheduler by name.

    Args:
        steps_per_epoch: number of steps per epoch.
        scheduler: scheduler name like 'constant', 'cosine_decay', 'step_decay',
            'exponential_decay', 'polynomial_decay', 'multi_step_decay'. Default: 'constant'.
        lr: learning rate value. Default: 0.01.
        min_lr: lower lr bound for 'cosine_decay' schedulers. Default: 1e-6.
        warmup_epochs: epochs to warmup LR, if scheduler supports. Default: 3.
        warmup_factor: the warmup phase of scheduler is a linearly increasing lr,
            the beginning factor is `warmup_factor`, i.e., the lr of the first step/epoch is lr*warmup_factor,
            and the ending lr in the warmup phase is lr. Default: 0.0
        decay_epochs: for 'cosine_decay' schedulers, decay LR to min_lr in `decay_epochs`.
            For 'step_decay' scheduler, decay LR by a factor of `decay_rate` every `decay_epochs`. Default: 10.
        decay_rate: LR decay rate (default: 0.9)
        milestones: list of epoch milestones for 'multi_step_decay' scheduler. Must be increasing.
        num_epochs: number of total epochs.
        lr_epoch_stair: If True, LR will be updated in the beginning of each new epoch
            and the LR will be consistent for each batch in one epoch.
            Otherwise, learning rate will be updated dynamically in each step. (default=False)
    Returns:
        Cell object for computing LR with input of current global steps
    """
    # check params
    if milestones is None:
        milestones = []

    if warmup_epochs + decay_epochs > num_epochs:
        print("[WARNING]: warmup_epochs + decay_epochs > num_epochs. Please check and reduce decay_epochs!")

    # lr warmup phase
    warmup_lr_scheduler = []
    if warmup_epochs > 0:
        if warmup_factor == 0 and lr_epoch_stair:
            print(
                "[WARNING]: The warmup factor is set to 0, lr of 0-th epoch is always zero! " "Recommend value is 0.01."
            )
        warmup_func = linear_lr if lr_epoch_stair else linear_refined_lr
        warmup_lr_scheduler = warmup_func(
            start_factor=warmup_factor,
            end_factor=1.0,
            total_iters=warmup_epochs,
            lr=lr,
            steps_per_epoch=steps_per_epoch,
            epochs=warmup_epochs,
        )

    # lr decay phase
    main_epochs = num_epochs - warmup_epochs
    if scheduler in ["cosine_decay", "warmup_cosine_decay"]:
        cosine_func = cosine_decay_lr if lr_epoch_stair else cosine_decay_refined_lr
        main_lr_scheduler = cosine_func(
            decay_epochs=decay_epochs,
            eta_min=min_lr,
            eta_max=lr,
            steps_per_epoch=steps_per_epoch,
            epochs=main_epochs,
            num_cycles=num_cycles,
            cycle_decay=cycle_decay,
        )
    elif scheduler == "exponential_decay":
        exponential_func = exponential_lr if lr_epoch_stair else exponential_refined_lr
        main_lr_scheduler = exponential_func(
            gamma=decay_rate, lr=lr, steps_per_epoch=steps_per_epoch, epochs=main_epochs
        )
    elif scheduler == "polynomial_decay":
        polynomial_func = polynomial_lr if lr_epoch_stair else polynomial_refined_lr
        main_lr_scheduler = polynomial_func(
            total_iters=main_epochs, power=decay_rate, lr=lr, steps_per_epoch=steps_per_epoch, epochs=main_epochs
        )
    elif scheduler == "step_decay":
        main_lr_scheduler = step_lr(
            step_size=decay_epochs, gamma=decay_rate, lr=lr, steps_per_epoch=steps_per_epoch, epochs=main_epochs
        )
    elif scheduler == "multi_step_decay":
        main_lr_scheduler = multi_step_lr(
            milestones=milestones, gamma=decay_rate, lr=lr, steps_per_epoch=steps_per_epoch, epochs=main_epochs
        )
    elif scheduler == "constant":
        main_lr_scheduler = [lr for _ in range(steps_per_epoch * main_epochs)]
    else:
        raise ValueError(f"Invalid scheduler: {scheduler}")

    # combine
    lr_scheduler = warmup_lr_scheduler + main_lr_scheduler

    return lr_scheduler
mindocr.scheduler.warmup_cosine_decay_lr

Cosine Decay with Warmup Learning Rate Scheduler

mindocr.scheduler.warmup_cosine_decay_lr.WarmupCosineDecayLR

Bases: LearningRateSchedule

CosineDecayLR with warmup

PARAMETER DESCRIPTION
min_lr

(float) lower lr bound for 'WarmupCosineDecayLR' schedulers.

max_lr

(float) upper lr bound for 'WarmupCosineDecayLR' schedulers.

warmup_epochs

(int) the number of warm up epochs of learning rate.

decay_epochs

(int) the number of decay epochs of learning rate.

steps_per_epoch

(int) the number of steps per epoch.

step_mode

(bool) determine decay along steps or epochs. True for steps, False for epochs.

DEFAULT: True

The learning rate will increase from 0 to max_lr in warmup_epochs epochs, then decay to min_lr in decay_epochs epochs

Source code in mindocr\scheduler\warmup_cosine_decay_lr.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class WarmupCosineDecayLR(LearningRateSchedule):
    """CosineDecayLR with warmup
    Args:

        min_lr: (float) lower lr bound for 'WarmupCosineDecayLR' schedulers.
        max_lr: (float) upper lr bound for 'WarmupCosineDecayLR' schedulers.
        warmup_epochs: (int) the number of warm up epochs of learning rate.
        decay_epochs: (int) the number of decay epochs of learning rate.
        steps_per_epoch: (int) the number of steps per epoch.
        step_mode: (bool) determine decay along steps or epochs. True for steps, False for epochs.

    The learning rate will increase from 0 to max_lr in `warmup_epochs` epochs,
    then decay to min_lr in `decay_epochs` epochs
    """

    def __init__(
        self,
        min_lr,
        max_lr,
        warmup_epochs,
        decay_epochs,
        steps_per_epoch,
        step_mode=True,
    ):
        super().__init__()
        self.warmup_steps = warmup_epochs * steps_per_epoch
        self.decay_steps = decay_epochs * steps_per_epoch
        self.decay_epochs = decay_epochs
        self.warmup_epochs = warmup_epochs
        self.steps_per_epoch = steps_per_epoch
        self.step_mode = step_mode
        if self.warmup_steps > 0:
            self.warmup_lr = nn.WarmUpLR(max_lr, self.warmup_steps if step_mode else warmup_epochs)
        self.cosine_decay_lr = nn.CosineDecayLR(min_lr, max_lr, self.decay_steps if step_mode else decay_epochs)

    def step_lr(self, global_step):
        if self.warmup_steps > 0:
            if global_step > self.warmup_steps:
                lr = self.cosine_decay_lr(global_step - self.warmup_steps)
            else:
                lr = self.warmup_lr(global_step)
        else:
            lr = self.cosine_decay_lr(global_step)
        return lr

    def epoch_lr(self, global_step):
        cur_epoch = global_step // self.steps_per_epoch
        if self.warmup_steps > 0:
            if global_step > self.warmup_steps:
                lr = self.cosine_decay_lr(cur_epoch - self.warmup_epochs)
            else:
                lr = self.warmup_lr(cur_epoch)
        else:
            lr = self.cosine_decay_lr(cur_epoch)
        return lr

    def construct(self, global_step):
        if self.step_mode:
            lr = self.step_lr(global_step)
        else:
            lr = self.epoch_lr(global_step)

        return lr

mindocr.utils

mindocr.utils.callbacks
mindocr.utils.callbacks.EvalSaveCallback

Bases: Callback

Callbacks for evaluation while training

PARAMETER DESCRIPTION
network

network (without loss)

TYPE: nn.Cell

loader

dataloader

TYPE: Dataset DEFAULT: None

ema

if not None, the ema params will be loaded to the network for evaluation.

DEFAULT: None

Source code in mindocr\utils\callbacks.py
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
class EvalSaveCallback(Callback):
    """
    Callbacks for evaluation while training

    Args:
        network (nn.Cell): network (without loss)
        loader (Dataset): dataloader
        ema: if not None, the ema params will be loaded to the network for evaluation.
    """

    def __init__(
        self,
        network,
        loader=None,
        loss_fn=None,
        postprocessor=None,
        metrics=None,
        pred_cast_fp32=False,
        rank_id=0,
        device_num=None,
        logger=None,
        batch_size=20,
        ckpt_save_dir="./",
        main_indicator="hmean",
        ema=None,
        loader_output_columns=[],
        input_indices=None,
        label_indices=None,
        meta_data_indices=None,
        val_interval=1,
        val_start_epoch=1,
        log_interval=1,
        ckpt_save_policy="top_k",
        ckpt_max_keep=10,
        start_epoch=0,
    ):
        self.rank_id = rank_id
        self.is_main_device = rank_id in [0, None]
        self.loader_eval = loader
        self.network = network
        self.ema = ema
        self.logger = print if logger is None else logger.info
        self.val_interval = val_interval
        self.val_start_epoch = val_start_epoch
        self.log_interval = log_interval
        self.batch_size = batch_size
        if self.loader_eval is not None:
            self.net_evaluator = Evaluator(
                network,
                loader,
                loss_fn,
                postprocessor,
                metrics,
                pred_cast_fp32=pred_cast_fp32,
                loader_output_columns=loader_output_columns,
                input_indices=input_indices,
                label_indices=label_indices,
                meta_data_indices=meta_data_indices,
            )
            self.main_indicator = main_indicator
            self.best_perf = -1e8
        else:
            self.main_indicator = "train_loss"
            self.best_perf = 1e8

        self.ckpt_save_dir = ckpt_save_dir
        if not os.path.exists(ckpt_save_dir):
            os.makedirs(ckpt_save_dir)

        self.last_epoch_end_time = time.time()
        self.epoch_start_time = time.time()
        self.step_start_time = time.time()

        self._loss_avg_meter = AverageMeter()

        self._reduce_sum = ms.ops.AllReduce()
        self._device_num = device_num
        # lamda expression is not supported in jit
        self._loss_reduce = self._reduce if device_num is not None else lambda x: x

        if self.is_main_device:
            self.ckpt_save_policy = ckpt_save_policy
            self.ckpt_manager = CheckpointManager(
                ckpt_save_dir,
                ckpt_save_policy,
                k=ckpt_max_keep,
                prefer_low_perf=(self.main_indicator == "train_loss"),
            )
        self.start_epoch = start_epoch

    @jit
    def _reduce(self, x):
        return self._reduce_sum(x) / self._device_num  # average value across all devices

    def on_train_step_end(self, run_context):
        """
        Print training loss at the end of step.

        Args:
            run_context (RunContext): Context of the train running.
        """
        cb_params = run_context.original_args()
        loss = _handle_loss(cb_params.net_outputs)
        cur_epoch = cb_params.cur_epoch_num
        data_sink_mode = cb_params.dataset_sink_mode
        cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num + 1

        self._loss_avg_meter.update(self._loss_reduce(loss))

        if not data_sink_mode and cur_step_in_epoch % self.log_interval == 0:
            opt = cb_params.train_network.optimizer
            learning_rate = opt.learning_rate
            cur_lr = learning_rate(opt.global_step - 1).asnumpy().squeeze()
            per_step_time = (time.time() - self.step_start_time) * 1000 / self.log_interval
            fps = self.batch_size * 1000 / per_step_time
            loss = self._loss_avg_meter.val.asnumpy()
            msg = (
                f"epoch: [{cur_epoch}/{cb_params.epoch_num}] step: [{cur_step_in_epoch}/{cb_params.batch_num}], "
                f"loss: {loss:.6f}, lr: {cur_lr:.6f}, per step time: {per_step_time:.3f} ms, fps: {fps:.2f} img/s"
            )
            self.logger(msg)
            self.step_start_time = time.time()

    def on_train_epoch_begin(self, run_context):
        """
        Called before each epoch beginning.
        Args:
            run_context (RunContext): Include some information of the model.
        """
        self._loss_avg_meter.reset()
        self.epoch_start_time = time.time()
        self.step_start_time = time.time()

    def on_train_epoch_end(self, run_context):
        """
        Called after each training epoch end.

        Args:
            run_context (RunContext): Include some information of the model.
        """
        cb_params = run_context.original_args()
        cur_epoch = cb_params.cur_epoch_num
        train_time = time.time() - self.epoch_start_time
        train_loss = self._loss_avg_meter.avg.asnumpy()

        data_sink_mode = cb_params.dataset_sink_mode
        if data_sink_mode:
            loss_scale_manager = cb_params.train_network.network.loss_scaling_manager
        else:
            loss_scale_manager = cb_params.train_network.loss_scaling_manager

        epoch_time = time.time() - self.epoch_start_time
        per_step_time = epoch_time * 1000 / cb_params.batch_num
        fps = 1000 * self.batch_size / per_step_time
        msg = (
            f"epoch: [{cur_epoch}/{cb_params.epoch_num}], loss: {train_loss:.6f}, "
            f"epoch time: {epoch_time:.3f} s, per step time: {per_step_time:.3f} ms, fps: {fps:.2f} img/s"
        )
        self.logger(msg)

        eval_done = False
        if self.loader_eval is not None:
            if cur_epoch >= self.val_start_epoch and (cur_epoch - self.val_start_epoch) % self.val_interval == 0:
                eval_start = time.time()
                if self.ema is not None:
                    # swap ema weight and network weight
                    self.ema.swap_before_eval()
                measures = self.net_evaluator.eval()

                eval_done = True
                if self.is_main_device:
                    perf = measures[self.main_indicator]
                    eval_time = time.time() - eval_start
                    self.logger(f"Performance: {measures}, eval time: {eval_time}")
            else:
                measures = {m_name: None for m_name in self.net_evaluator.metric_names}
                eval_time = 0
                perf = 1e-8
        else:
            perf = train_loss

        # save best models and results using card 0
        if self.is_main_device:
            # save best models
            if (self.main_indicator == "train_loss" and perf < self.best_perf) or (
                self.main_indicator != "train_loss" and eval_done and perf > self.best_perf
            ):  # when val_while_train enabled, only find best checkpoint after eval done.
                self.best_perf = perf
                # ema weight will be saved if enabled.
                save_checkpoint(self.network, os.path.join(self.ckpt_save_dir, "best.ckpt"))

                self.logger(f"=> Best {self.main_indicator}: {self.best_perf}, checkpoint saved.")

            # save history checkpoints
            self.ckpt_manager.save(self.network, perf, ckpt_name=f"e{cur_epoch}.ckpt")
            ms.save_checkpoint(
                cb_params.train_network,
                os.path.join(self.ckpt_save_dir, "train_resume.ckpt"),
                append_dict={"epoch_num": cur_epoch, "loss_scale": loss_scale_manager.get_loss_scale()},
            )
            # record results
            if cur_epoch == 1:
                if self.loader_eval is not None:
                    perf_columns = ["loss"] + list(measures.keys()) + ["train_time", "eval_time"]
                else:
                    perf_columns = ["loss", "train_time"]
                self.rec = PerfRecorder(self.ckpt_save_dir, metric_names=perf_columns)  # record column names
            elif cur_epoch == self.start_epoch + 1:
                self.rec = PerfRecorder(self.ckpt_save_dir, resume=True)

            if self.loader_eval is not None:
                epoch_perf_values = [cur_epoch, train_loss] + list(measures.values()) + [train_time, eval_time]
            else:
                epoch_perf_values = [cur_epoch, train_loss, train_time]
            self.rec.add(*epoch_perf_values)  # record column values

        # swap back network weight and ema weight. MUST execute after model saving and before next-step training
        if (self.ema is not None) and eval_done:
            self.ema.swap_after_eval()

        # tot_time = time.time() - self.last_epoch_end_time
        self.last_epoch_end_time = time.time()

    def on_train_end(self, run_context):
        if self.is_main_device:
            self.rec.save_curves()  # save performance curve figure
            self.logger(f"=> Best {self.main_indicator}: {self.best_perf} \nTraining completed!")

            if self.ckpt_save_policy == "top_k":
                log_str = f"Top K checkpoints:\n{self.main_indicator}\tcheckpoint\n"
                for p, ckpt_name in self.ckpt_manager.get_ckpt_queue():
                    log_str += f"{p:.4f}\t{os.path.join(self.ckpt_save_dir, ckpt_name)}\n"
                self.logger(log_str)
mindocr.utils.callbacks.EvalSaveCallback.on_train_epoch_begin(run_context)

Called before each epoch beginning.

PARAMETER DESCRIPTION
run_context

Include some information of the model.

TYPE: RunContext

Source code in mindocr\utils\callbacks.py
149
150
151
152
153
154
155
156
157
def on_train_epoch_begin(self, run_context):
    """
    Called before each epoch beginning.
    Args:
        run_context (RunContext): Include some information of the model.
    """
    self._loss_avg_meter.reset()
    self.epoch_start_time = time.time()
    self.step_start_time = time.time()
mindocr.utils.callbacks.EvalSaveCallback.on_train_epoch_end(run_context)

Called after each training epoch end.

PARAMETER DESCRIPTION
run_context

Include some information of the model.

TYPE: RunContext

Source code in mindocr\utils\callbacks.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def on_train_epoch_end(self, run_context):
    """
    Called after each training epoch end.

    Args:
        run_context (RunContext): Include some information of the model.
    """
    cb_params = run_context.original_args()
    cur_epoch = cb_params.cur_epoch_num
    train_time = time.time() - self.epoch_start_time
    train_loss = self._loss_avg_meter.avg.asnumpy()

    data_sink_mode = cb_params.dataset_sink_mode
    if data_sink_mode:
        loss_scale_manager = cb_params.train_network.network.loss_scaling_manager
    else:
        loss_scale_manager = cb_params.train_network.loss_scaling_manager

    epoch_time = time.time() - self.epoch_start_time
    per_step_time = epoch_time * 1000 / cb_params.batch_num
    fps = 1000 * self.batch_size / per_step_time
    msg = (
        f"epoch: [{cur_epoch}/{cb_params.epoch_num}], loss: {train_loss:.6f}, "
        f"epoch time: {epoch_time:.3f} s, per step time: {per_step_time:.3f} ms, fps: {fps:.2f} img/s"
    )
    self.logger(msg)

    eval_done = False
    if self.loader_eval is not None:
        if cur_epoch >= self.val_start_epoch and (cur_epoch - self.val_start_epoch) % self.val_interval == 0:
            eval_start = time.time()
            if self.ema is not None:
                # swap ema weight and network weight
                self.ema.swap_before_eval()
            measures = self.net_evaluator.eval()

            eval_done = True
            if self.is_main_device:
                perf = measures[self.main_indicator]
                eval_time = time.time() - eval_start
                self.logger(f"Performance: {measures}, eval time: {eval_time}")
        else:
            measures = {m_name: None for m_name in self.net_evaluator.metric_names}
            eval_time = 0
            perf = 1e-8
    else:
        perf = train_loss

    # save best models and results using card 0
    if self.is_main_device:
        # save best models
        if (self.main_indicator == "train_loss" and perf < self.best_perf) or (
            self.main_indicator != "train_loss" and eval_done and perf > self.best_perf
        ):  # when val_while_train enabled, only find best checkpoint after eval done.
            self.best_perf = perf
            # ema weight will be saved if enabled.
            save_checkpoint(self.network, os.path.join(self.ckpt_save_dir, "best.ckpt"))

            self.logger(f"=> Best {self.main_indicator}: {self.best_perf}, checkpoint saved.")

        # save history checkpoints
        self.ckpt_manager.save(self.network, perf, ckpt_name=f"e{cur_epoch}.ckpt")
        ms.save_checkpoint(
            cb_params.train_network,
            os.path.join(self.ckpt_save_dir, "train_resume.ckpt"),
            append_dict={"epoch_num": cur_epoch, "loss_scale": loss_scale_manager.get_loss_scale()},
        )
        # record results
        if cur_epoch == 1:
            if self.loader_eval is not None:
                perf_columns = ["loss"] + list(measures.keys()) + ["train_time", "eval_time"]
            else:
                perf_columns = ["loss", "train_time"]
            self.rec = PerfRecorder(self.ckpt_save_dir, metric_names=perf_columns)  # record column names
        elif cur_epoch == self.start_epoch + 1:
            self.rec = PerfRecorder(self.ckpt_save_dir, resume=True)

        if self.loader_eval is not None:
            epoch_perf_values = [cur_epoch, train_loss] + list(measures.values()) + [train_time, eval_time]
        else:
            epoch_perf_values = [cur_epoch, train_loss, train_time]
        self.rec.add(*epoch_perf_values)  # record column values

    # swap back network weight and ema weight. MUST execute after model saving and before next-step training
    if (self.ema is not None) and eval_done:
        self.ema.swap_after_eval()

    # tot_time = time.time() - self.last_epoch_end_time
    self.last_epoch_end_time = time.time()
mindocr.utils.callbacks.EvalSaveCallback.on_train_step_end(run_context)

Print training loss at the end of step.

PARAMETER DESCRIPTION
run_context

Context of the train running.

TYPE: RunContext

Source code in mindocr\utils\callbacks.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
def on_train_step_end(self, run_context):
    """
    Print training loss at the end of step.

    Args:
        run_context (RunContext): Context of the train running.
    """
    cb_params = run_context.original_args()
    loss = _handle_loss(cb_params.net_outputs)
    cur_epoch = cb_params.cur_epoch_num
    data_sink_mode = cb_params.dataset_sink_mode
    cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num + 1

    self._loss_avg_meter.update(self._loss_reduce(loss))

    if not data_sink_mode and cur_step_in_epoch % self.log_interval == 0:
        opt = cb_params.train_network.optimizer
        learning_rate = opt.learning_rate
        cur_lr = learning_rate(opt.global_step - 1).asnumpy().squeeze()
        per_step_time = (time.time() - self.step_start_time) * 1000 / self.log_interval
        fps = self.batch_size * 1000 / per_step_time
        loss = self._loss_avg_meter.val.asnumpy()
        msg = (
            f"epoch: [{cur_epoch}/{cb_params.epoch_num}] step: [{cur_step_in_epoch}/{cb_params.batch_num}], "
            f"loss: {loss:.6f}, lr: {cur_lr:.6f}, per step time: {per_step_time:.3f} ms, fps: {fps:.2f} img/s"
        )
        self.logger(msg)
        self.step_start_time = time.time()
mindocr.utils.checkpoint

checkpoint manager

mindocr.utils.checkpoint.CheckpointManager

Manage checkpoint files according to ckpt_save_policy of checkpoint.

PARAMETER DESCRIPTION
ckpt_save_dir

directory to save the checkpoints

TYPE: str

ckpt_save_policy

Checkpoint saving strategy. Option: None, "top_k", or "latest_k". None means to save each checkpoint, top_k means to save K checkpoints with the best performance, and latest_k means saving the latest K checkpoint. Default: top_k.

TYPE: str DEFAULT: 'top_k'

k

top k value

TYPE: int DEFAULT: 10

prefer_low_perf

standard for selecting the top k performance. If False, pick top k checkpoints with highest performance e.g. accuracy. If True, pick top k checkpoints with the lowest performance, e.g. loss.

TYPE: bool DEFAULT: False

Source code in mindocr\utils\checkpoint.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
class CheckpointManager:
    """
    Manage checkpoint files according to ckpt_save_policy of checkpoint.
    Args:
        ckpt_save_dir (str): directory to save the checkpoints
        ckpt_save_policy (str): Checkpoint saving strategy. Option: None, "top_k", or "latest_k".
            None means to save each checkpoint, top_k means to save K checkpoints with the best performance,
            and latest_k means saving the latest K checkpoint. Default: top_k.
        k (int): top k value
        prefer_low_perf (bool): standard for selecting the top k performance. If False, pick top k checkpoints with
            highest performance e.g. accuracy. If True, pick top k checkpoints with the lowest performance, e.g. loss.

    """

    def __init__(self, ckpt_save_dir, ckpt_save_policy="top_k", k=10, prefer_low_perf=False, del_past=True):
        self.ckpt_save_dir = ckpt_save_dir
        self._ckpt_filelist = []
        self.ckpt_save_policy = ckpt_save_policy
        self.k = k

        self.ckpt_queue = []
        self.del_past = del_past
        self.prefer_low_perf = prefer_low_perf

    def get_ckpt_queue(self):
        """Get all the related checkpoint files managed here."""
        return self.ckpt_queue

    @property
    def ckpt_num(self):
        """Get the number of the related checkpoint files managed here."""
        return len(self.ckpt_queue)

    def remove_ckpt_file(self, file_name):
        """Remove the specified checkpoint file from this checkpoint manager and also from the directory."""
        try:
            if os.path.exists(file_name):
                os.chmod(file_name, stat.S_IWRITE)
                os.remove(file_name)
        except OSError:
            logger.warning("OSError, failed to remove the older ckpt file %s.", file_name)
        except ValueError:
            logger.warning("ValueError, failed to remove the older ckpt file %s.", file_name)

    def save_top_k(self, network, perf, ckpt_name, verbose=True):
        """Save and return Top K checkpoint address and accuracy."""
        self.ckpt_queue.append((perf, ckpt_name))
        self.ckpt_queue = sorted(
            self.ckpt_queue, key=lambda x: x[0], reverse=not self.prefer_low_perf
        )  # by default, reverse is True for descending order
        if len(self.ckpt_queue) > self.k:
            to_del = self.ckpt_queue.pop(-1)
            # save if the perf is better than the minimum in the heap
            if to_del[1] != ckpt_name:
                ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
                # del minimum
                self.remove_ckpt_file(os.path.join(self.ckpt_save_dir, to_del[1]))
        else:
            ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))

    def save_latest_k(self, network, ckpt_name):
        """Save latest K checkpoint."""
        ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
        self.ckpt_queue.append(ckpt_name)
        if len(self.ckpt_queue) > self.k:
            to_del = self.ckpt_queue.pop(0)
            if self.del_past:
                self.remove_ckpt_file(os.path.join(self.ckpt_save_dir, to_del))

    def save_single(self, network, ckpt_path):
        ms.save_checkpoint(network, ckpt_path)

    def save(self, network, perf=None, ckpt_name=None):
        """Save checkpoint according to different save strategy."""
        if self.ckpt_save_policy is None:
            ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
        elif self.ckpt_save_policy == "top_k":
            if perf is None:
                raise ValueError(
                    "Evaluation performance is None, but `top_k` ckpt save policy requires evaluation performance"
                )
            self.save_top_k(network, perf, ckpt_name)
            return self.ckpt_queue
        elif self.ckpt_save_policy == "latest_k":
            self.save_latest_k(network, ckpt_name)
            return self.ckpt_queue
        else:
            raise ValueError(
                f"The expected 'ckpt_save_policy' is None, top_k or latest_k, but got: {self.ckpt_save_policy}."
            )
mindocr.utils.checkpoint.CheckpointManager.ckpt_num property

Get the number of the related checkpoint files managed here.

mindocr.utils.checkpoint.CheckpointManager.get_ckpt_queue()

Get all the related checkpoint files managed here.

Source code in mindocr\utils\checkpoint.py
33
34
35
def get_ckpt_queue(self):
    """Get all the related checkpoint files managed here."""
    return self.ckpt_queue
mindocr.utils.checkpoint.CheckpointManager.remove_ckpt_file(file_name)

Remove the specified checkpoint file from this checkpoint manager and also from the directory.

Source code in mindocr\utils\checkpoint.py
42
43
44
45
46
47
48
49
50
51
def remove_ckpt_file(self, file_name):
    """Remove the specified checkpoint file from this checkpoint manager and also from the directory."""
    try:
        if os.path.exists(file_name):
            os.chmod(file_name, stat.S_IWRITE)
            os.remove(file_name)
    except OSError:
        logger.warning("OSError, failed to remove the older ckpt file %s.", file_name)
    except ValueError:
        logger.warning("ValueError, failed to remove the older ckpt file %s.", file_name)
mindocr.utils.checkpoint.CheckpointManager.save(network, perf=None, ckpt_name=None)

Save checkpoint according to different save strategy.

Source code in mindocr\utils\checkpoint.py
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
def save(self, network, perf=None, ckpt_name=None):
    """Save checkpoint according to different save strategy."""
    if self.ckpt_save_policy is None:
        ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
    elif self.ckpt_save_policy == "top_k":
        if perf is None:
            raise ValueError(
                "Evaluation performance is None, but `top_k` ckpt save policy requires evaluation performance"
            )
        self.save_top_k(network, perf, ckpt_name)
        return self.ckpt_queue
    elif self.ckpt_save_policy == "latest_k":
        self.save_latest_k(network, ckpt_name)
        return self.ckpt_queue
    else:
        raise ValueError(
            f"The expected 'ckpt_save_policy' is None, top_k or latest_k, but got: {self.ckpt_save_policy}."
        )
mindocr.utils.checkpoint.CheckpointManager.save_latest_k(network, ckpt_name)

Save latest K checkpoint.

Source code in mindocr\utils\checkpoint.py
69
70
71
72
73
74
75
76
def save_latest_k(self, network, ckpt_name):
    """Save latest K checkpoint."""
    ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
    self.ckpt_queue.append(ckpt_name)
    if len(self.ckpt_queue) > self.k:
        to_del = self.ckpt_queue.pop(0)
        if self.del_past:
            self.remove_ckpt_file(os.path.join(self.ckpt_save_dir, to_del))
mindocr.utils.checkpoint.CheckpointManager.save_top_k(network, perf, ckpt_name, verbose=True)

Save and return Top K checkpoint address and accuracy.

Source code in mindocr\utils\checkpoint.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def save_top_k(self, network, perf, ckpt_name, verbose=True):
    """Save and return Top K checkpoint address and accuracy."""
    self.ckpt_queue.append((perf, ckpt_name))
    self.ckpt_queue = sorted(
        self.ckpt_queue, key=lambda x: x[0], reverse=not self.prefer_low_perf
    )  # by default, reverse is True for descending order
    if len(self.ckpt_queue) > self.k:
        to_del = self.ckpt_queue.pop(-1)
        # save if the perf is better than the minimum in the heap
        if to_del[1] != ckpt_name:
            ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
            # del minimum
            self.remove_ckpt_file(os.path.join(self.ckpt_save_dir, to_del[1]))
    else:
        ms.save_checkpoint(network, os.path.join(self.ckpt_save_dir, ckpt_name))
mindocr.utils.ema
mindocr.utils.ema.EMA

Bases: nn.Cell

PARAMETER DESCRIPTION
updates

number of ema updates, which can be restored from resumed training.

DEFAULT: 0

Source code in mindocr\utils\ema.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class EMA(nn.Cell):
    """
    Args:
        updates: number of ema updates, which can be restored from resumed training.
    """

    def __init__(self, network, ema_decay=0.9999, updates=0):
        super().__init__()
        # TODO: net.trainable_params() is more reasonable?
        self.net_weight = ms.ParameterTuple(network.get_parameters())
        self.ema_weight = self.net_weight.clone(prefix="ema", init="same")
        self.swap_cache = self.net_weight.clone(prefix="swap", init="zeros")

        self.ema_decay = ema_decay
        self.updates = Parameter(Tensor(updates, ms.float32), requires_grad=False)

        self.hyper_map = C.HyperMap()
        self.map = ops.HyperMap()

    def ema_update(self):
        """Update EMA parameters."""
        self.updates += 1
        d = self.ema_decay * (1 - F.exp(-self.updates / 2000))
        # update trainable parameters
        success = self.hyper_map(F.partial(_ema_op, d), self.ema_weight, self.net_weight)
        self.updates = F.depend(self.updates, success)
        return self.updates

    # @ms_function
    def swap_before_eval(self):
        # net -> swap
        success = self.map(ops.assign, self.swap_cache, self.net_weight)
        # ema -> net
        success = F.depend(success, self.map(ops.assign, self.net_weight, self.ema_weight))
        return success

    # @ms_function
    def swap_after_eval(self):
        # swap -> net
        success = self.map(ops.assign, self.net_weight, self.swap_cache)
        return success
mindocr.utils.ema.EMA.ema_update()

Update EMA parameters.

Source code in mindocr\utils\ema.py
33
34
35
36
37
38
39
40
def ema_update(self):
    """Update EMA parameters."""
    self.updates += 1
    d = self.ema_decay * (1 - F.exp(-self.updates / 2000))
    # update trainable parameters
    success = self.hyper_map(F.partial(_ema_op, d), self.ema_weight, self.net_weight)
    self.updates = F.depend(self.updates, success)
    return self.updates
mindocr.utils.evaluator
mindocr.utils.evaluator.Evaluator
PARAMETER DESCRIPTION
network

network

dataloader

data loader to generate batch data, where the data columns in a batch are defined by the transform pipeline and output_columns.

loss_fn

loss function

DEFAULT: None

postprocessor

post-processor

DEFAULT: None

metrics

metrics to evaluate network performance

DEFAULT: None

pred_cast_fp32

whether to cast network prediction to float 32. Set True if AMP is used.

DEFAULT: False

input_indices

The indices of the data tuples which will be fed into the network. If it is None, then the first item will be fed only.

DEFAULT: None

label_indices

The indices of the data tuples which will be marked as label. If it is None, then the remaining items will be marked as label.

DEFAULT: None

meta_data_indices

The indices for the data tuples which will be marked as metadata. If it is None, then the item indices not in input or label indices are marked as meta data.

DEFAULT: None

Source code in mindocr\utils\evaluator.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
class Evaluator:
    """
    Args:
        network: network
        dataloader : data loader to generate batch data, where the data columns in a batch are defined by the transform
            pipeline and `output_columns`.
        loss_fn: loss function
        postprocessor: post-processor
        metrics: metrics to evaluate network performance
        pred_cast_fp32: whether to cast network prediction to float 32. Set True if AMP is used.
        input_indices: The indices of the data tuples which will be fed into the network.
            If it is None, then the first item will be fed only.
        label_indices: The indices of the data tuples which will be marked as label.
            If it is None, then the remaining items will be marked as label.
        meta_data_indices: The indices for the data tuples which will be marked as metadata.
            If it is None, then the item indices not in input or label indices are marked as meta data.
    """

    def __init__(
        self,
        network,
        dataloader,
        loss_fn=None,
        postprocessor=None,
        metrics=None,
        pred_cast_fp32=False,
        loader_output_columns=None,
        input_indices=None,
        label_indices=None,
        meta_data_indices=None,
        num_epochs=-1,
        visualize=False,
        verbose=False,
        **kwargs,
    ):
        self.net = network
        self.postprocessor = postprocessor
        self.metrics = metrics if isinstance(metrics, List) else [metrics]
        self.metric_names = []
        for m in metrics:
            assert hasattr(m, "metric_names") and isinstance(m.metric_names, List), (
                f"Metric object must contain `metric_names` attribute to indicate the metric names as a List type, "
                f"but not found in {m.__class__.__name__}"
            )
            self.metric_names += m.metric_names

        self.pred_cast_fp32 = pred_cast_fp32
        self.visualize = visualize
        self.verbose = verbose
        eval_loss = False
        if loss_fn is not None:
            eval_loss = True
            self.loss_fn = loss_fn
        assert not eval_loss, "not impl"

        # create iterator
        self.reload(
            dataloader,
            loader_output_columns,
            input_indices,
            label_indices,
            meta_data_indices,
            num_epochs,
        )

    def reload(
        self,
        dataloader,
        loader_output_columns=None,
        input_indices=None,
        label_indices=None,
        meta_data_indices=None,
        num_epochs=-1,
    ):
        # create iterator
        self.iterator = dataloader.create_tuple_iterator(num_epochs=num_epochs, output_numpy=False, do_copy=False)
        self.num_batches_eval = dataloader.get_dataset_size()

        # dataset output columns
        self.loader_output_columns = loader_output_columns or []
        self.input_indices = input_indices
        self.label_indices = label_indices
        self.meta_data_indices = meta_data_indices

    def eval(self):
        """
        Args:
        """
        eval_res = {}

        self.net.set_train(False)
        for m in self.metrics:
            m.clear()

        for i, data in tqdm(enumerate(self.iterator), total=self.num_batches_eval):
            if self.input_indices is not None:
                inputs = [data[x] for x in self.input_indices]
            else:
                inputs = [data[0]]

            if self.label_indices is not None:
                gt = [data[x] for x in self.label_indices]
            else:
                gt = data[1:]

            preds = self.net(*inputs)

            if self.pred_cast_fp32:
                if isinstance(preds, ms.Tensor):
                    preds = F.cast(preds, mstype.float32)
                else:
                    preds = [F.cast(p, mstype.float32) for p in preds]

            data_info = {"labels": gt, "img_shape": inputs[0].shape}

            if self.postprocessor is not None:
                # additional info such as image path, original image size, pad shape, extracted in data processing
                if self.meta_data_indices is not None:
                    meta_info = [data[x] for x in self.meta_data_indices]
                else:
                    # assume the indices not in input_indices or label_indices are all meta_data_indices
                    input_indices = set(self.input_indices) if self.input_indices is not None else {0}
                    label_indices = (
                        set(self.label_indices) if self.label_indices is not None else set(range(1, len(data), 1))
                    )
                    meta_data_indices = sorted(set(range(len(data))) - input_indices - label_indices)
                    meta_info = [data[x] for x in meta_data_indices]

                data_info["meta_info"] = meta_info

                # NOTES: add more if new postprocess modules need new keys. shape_list is commonly needed by detection
                possible_keys_for_postprocess = ["shape_list", "raw_img_shape"]
                # TODO: remove raw_img_shape (used in tools/infer/text/parallel).
                #  shape_list = [h, w, ratio_h, ratio_w] already contain raw image shape.
                for k in possible_keys_for_postprocess:
                    if k in self.loader_output_columns:
                        data_info[k] = data[self.loader_output_columns.index(k)]

                preds = self.postprocessor(preds, **data_info)

            # metric internal update
            for m in self.metrics:
                m.update(preds, gt)

            if self.verbose:
                print("Data meta info: ", data_info)

        for m in self.metrics:
            res_dict = m.eval()
            eval_res.update(res_dict)

        self.net.set_train(True)

        return eval_res
mindocr.utils.evaluator.Evaluator.eval()
Source code in mindocr\utils\evaluator.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def eval(self):
    """
    Args:
    """
    eval_res = {}

    self.net.set_train(False)
    for m in self.metrics:
        m.clear()

    for i, data in tqdm(enumerate(self.iterator), total=self.num_batches_eval):
        if self.input_indices is not None:
            inputs = [data[x] for x in self.input_indices]
        else:
            inputs = [data[0]]

        if self.label_indices is not None:
            gt = [data[x] for x in self.label_indices]
        else:
            gt = data[1:]

        preds = self.net(*inputs)

        if self.pred_cast_fp32:
            if isinstance(preds, ms.Tensor):
                preds = F.cast(preds, mstype.float32)
            else:
                preds = [F.cast(p, mstype.float32) for p in preds]

        data_info = {"labels": gt, "img_shape": inputs[0].shape}

        if self.postprocessor is not None:
            # additional info such as image path, original image size, pad shape, extracted in data processing
            if self.meta_data_indices is not None:
                meta_info = [data[x] for x in self.meta_data_indices]
            else:
                # assume the indices not in input_indices or label_indices are all meta_data_indices
                input_indices = set(self.input_indices) if self.input_indices is not None else {0}
                label_indices = (
                    set(self.label_indices) if self.label_indices is not None else set(range(1, len(data), 1))
                )
                meta_data_indices = sorted(set(range(len(data))) - input_indices - label_indices)
                meta_info = [data[x] for x in meta_data_indices]

            data_info["meta_info"] = meta_info

            # NOTES: add more if new postprocess modules need new keys. shape_list is commonly needed by detection
            possible_keys_for_postprocess = ["shape_list", "raw_img_shape"]
            # TODO: remove raw_img_shape (used in tools/infer/text/parallel).
            #  shape_list = [h, w, ratio_h, ratio_w] already contain raw image shape.
            for k in possible_keys_for_postprocess:
                if k in self.loader_output_columns:
                    data_info[k] = data[self.loader_output_columns.index(k)]

            preds = self.postprocessor(preds, **data_info)

        # metric internal update
        for m in self.metrics:
            m.update(preds, gt)

        if self.verbose:
            print("Data meta info: ", data_info)

    for m in self.metrics:
        res_dict = m.eval()
        eval_res.update(res_dict)

    self.net.set_train(True)

    return eval_res
mindocr.utils.logger

Custom Logger.

mindocr.utils.logger.Logger

Bases: logging.Logger

Logger.

PARAMETER DESCRIPTION
logger_name

String. Logger name.

rank

Integer. Rank id.

DEFAULT: 0

Source code in mindocr\utils\logger.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class Logger(logging.Logger):
    """
    Logger.

    Args:
         logger_name: String. Logger name.
         rank: Integer. Rank id.
    """

    def __init__(self, logger_name, rank=0, log_fn=None):
        super(Logger, self).__init__(logger_name)
        self.rank = rank or 0
        self.log_fn = log_fn
        is_main_device = not rank

        if is_main_device:
            console = logging.StreamHandler(sys.stdout)
            console.setLevel(logging.INFO)
            formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(message)s")
            console.setFormatter(formatter)
            self.addHandler(console)

    def setup_logging_file(self, log_dir):
        """Setup logging file."""
        if not os.path.exists(log_dir):
            os.makedirs(log_dir, exist_ok=True)
        if self.log_fn is None:
            log_name = "log_%s.txt" % self.rank
            self.log_save_path = os.path.join(log_dir, log_name)
        else:
            self.log_save_path = os.path.join(log_dir, self.log_fn)
        fh = logging.FileHandler(self.log_save_path)
        fh.setLevel(logging.INFO)
        formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(message)s")
        fh.setFormatter(formatter)
        self.addHandler(fh)

    def info(self, msg, *args, **kwargs):
        if self.isEnabledFor(logging.INFO):
            self._log(logging.INFO, msg, args, **kwargs)

    def save_args(self, args):
        self.info("Args:")
        args_dict = vars(args)
        for key in args_dict.keys():
            self.info("--> %s: %s", key, args_dict[key])
        self.info("")

    def important_info(self, msg, *args, **kwargs):
        if self.isEnabledFor(logging.INFO) and self.rank == 0:
            line_width = 2
            important_msg = "\n"
            important_msg += ("*" * 70 + "\n") * line_width
            important_msg += ("*" * line_width + "\n") * 2
            important_msg += "*" * line_width + " " * 8 + msg + "\n"
            important_msg += ("*" * line_width + "\n") * 2
            important_msg += ("*" * 70 + "\n") * line_width
            self.info(important_msg, *args, **kwargs)
mindocr.utils.logger.Logger.setup_logging_file(log_dir)

Setup logging file.

Source code in mindocr\utils\logger.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def setup_logging_file(self, log_dir):
    """Setup logging file."""
    if not os.path.exists(log_dir):
        os.makedirs(log_dir, exist_ok=True)
    if self.log_fn is None:
        log_name = "log_%s.txt" % self.rank
        self.log_save_path = os.path.join(log_dir, log_name)
    else:
        self.log_save_path = os.path.join(log_dir, self.log_fn)
    fh = logging.FileHandler(self.log_save_path)
    fh.setLevel(logging.INFO)
    formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(message)s")
    fh.setFormatter(formatter)
    self.addHandler(fh)
mindocr.utils.logger.get_logger(log_dir, rank, log_fn=None)

Get Logger.

Source code in mindocr\utils\logger.py
67
68
69
70
71
72
def get_logger(log_dir, rank, log_fn=None):
    """Get Logger."""
    logger = Logger("mindocr", rank, log_fn=log_fn)
    logger.setup_logging_file(log_dir)

    return logger
mindocr.utils.loss_scaler
mindocr.utils.loss_scaler.get_loss_scales(cfg)
PARAMETER DESCRIPTION
cfg

configure dict of loss scaler

TYPE: dict

RETURNS DESCRIPTION

nn.Cell: scale_sens used to scale gradient

float

loss_scale used in optimizer (only used when loss scaler type is static and drop_overflow update is False)

Source code in mindocr\utils\loss_scaler.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def get_loss_scales(cfg):
    """
    Args:
        cfg (dict): configure dict of loss scaler

    Returns:
        nn.Cell: scale_sens used to scale gradient
        float: loss_scale used in optimizer
            (only used when loss scaler type is static and drop_overflow update is False)
    """
    # loss scale is 1.0 by default
    loss_scale_manager = nn.FixedLossScaleUpdateCell(loss_scale_value=1.0)

    # Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
    # `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
    # `FixedLossScaleManager`
    # But we never use FixedLossScaleManager, so optimizer_loss_scale is always 1.
    optimizer_loss_scale = 1.0

    if "loss_scaler" in cfg:
        assert (
            "loss_scale" in cfg.loss_scaler
        ), "Must specify the value for `loss_scale` in the config if `loss_scaler` is used."
        if cfg.loss_scaler.type == "dynamic":
            # TODO: scale_window can be related to num_batches, e.g., scale_window = num_batches * 2
            scale_factor = cfg.loss_scaler.get("scale_factor", 2.0)
            scale_window = cfg.loss_scaler.get("scale_window", 2000)
            # adjust by gradient_accumulation_steps so that the scaling process is the same as that of
            # batch_size=batch_size*gradient_accumulation_steps
            grad_accu_steps = cfg.train.get("gradient_accumulation_steps", 1)
            if grad_accu_steps > 1:
                scale_factor = scale_factor ** (1 / grad_accu_steps)
                scale_window = scale_window * grad_accu_steps
                print(
                    "INFO: gradient_accumulation_steps > 1, scale_factor and scale_window are adjusted accordingly for "
                    "dynamic loss scaler"
                )
            loss_scale_manager = nn.DynamicLossScaleUpdateCell(
                loss_scale_value=cfg.loss_scaler.get("loss_scale", 2**16),
                scale_factor=scale_factor,
                scale_window=scale_window,
            )
        elif cfg.loss_scaler.type == "static":
            loss_scale = cfg.loss_scaler.get("loss_scale", 1.0)
            loss_scale_manager = nn.FixedLossScaleUpdateCell(loss_scale)
        else:
            raise ValueError(f"Available loss scaler types are `static` and `dynamic`, but got {cfg.loss_scaler}")

    return loss_scale_manager, optimizer_loss_scale
mindocr.utils.misc
mindocr.utils.misc.AverageMeter

Computes and stores the average and current value

Source code in mindocr\utils\misc.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
class AverageMeter:
    """Computes and stores the average and current value"""

    def __init__(self) -> None:
        self.reset()

    def reset(self) -> None:
        self.val = Tensor(0.0, dtype=ms.float32)
        self.avg = Tensor(0.0, dtype=ms.float32)
        self.sum = Tensor(0.0, dtype=ms.float32)
        self.count = Tensor(0.0, dtype=ms.float32)

    def update(self, val: Tensor, n: int = 1) -> None:
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count
mindocr.utils.model_wrapper
mindocr.utils.model_wrapper.NetWithEvalWrapper

Bases: nn.Cell

A universal wrapper for any network with any loss for evaluation pipeline. Difference from NetWithLossWrapper: it returns loss_val, pred, and labels.

PARAMETER DESCRIPTION
net

network

TYPE: nn.Cell

loss_fn

loss function, if None, will not compute loss for evaluation dataset

DEFAULT: None

input_indices

The indices of the data tuples which will be fed into the network. If it is None, then the first item will be fed only.

DEFAULT: None

label_indices

The indices of the data tuples which will be fed into the loss function. If it is None, then the remaining items will be fed.

DEFAULT: None

Source code in mindocr\utils\model_wrapper.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
class NetWithEvalWrapper(nn.Cell):
    """
    A universal wrapper for any network with any loss for evaluation pipeline.
    Difference from NetWithLossWrapper: it returns loss_val, pred, and labels.

    Args:
        net (nn.Cell): network
        loss_fn: loss function, if None, will not compute loss for evaluation dataset
        input_indices: The indices of the data tuples which will be fed into the network.
            If it is None, then the first item will be fed only.
        label_indices: The indices of the data tuples which will be fed into the loss function.
            If it is None, then the remaining items will be fed.
    """

    def __init__(self, net, loss_fn=None, input_indices=None, label_indices=None):
        super().__init__(auto_prefix=False)
        self._net = net
        self._loss_fn = loss_fn
        # TODO: get this automatically from net and loss func
        self.input_indices = input_indices
        self.label_indices = label_indices

    def construct(self, *args):
        """
        Args:
            args (Tuple): contains network inputs, labels (given by data loader)
        Returns:
            Tuple: loss value (Tensor), pred (Union[Tensor, Tuple[Tensor]]), labels (Tuple)
        """
        # TODO: pred is a dict
        if self.input_indices is None:
            pred = self._net(args[0])
        else:
            pred = self._net(*select_inputs_by_indices(args, self.input_indices))

        if self.label_indices is None:
            labels = args[1:]
        else:
            labels = select_inputs_by_indices(args, self.label_indices)

        if self._loss_fn is not None:
            loss_val = self._loss_fn(pred, *labels)
        else:
            loss_val = None

        return loss_val, pred, labels
mindocr.utils.model_wrapper.NetWithEvalWrapper.construct(*args)
PARAMETER DESCRIPTION
args

contains network inputs, labels (given by data loader)

TYPE: Tuple DEFAULT: ()

RETURNS DESCRIPTION
Tuple

loss value (Tensor), pred (Union[Tensor, Tuple[Tensor]]), labels (Tuple)

Source code in mindocr\utils\model_wrapper.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def construct(self, *args):
    """
    Args:
        args (Tuple): contains network inputs, labels (given by data loader)
    Returns:
        Tuple: loss value (Tensor), pred (Union[Tensor, Tuple[Tensor]]), labels (Tuple)
    """
    # TODO: pred is a dict
    if self.input_indices is None:
        pred = self._net(args[0])
    else:
        pred = self._net(*select_inputs_by_indices(args, self.input_indices))

    if self.label_indices is None:
        labels = args[1:]
    else:
        labels = select_inputs_by_indices(args, self.label_indices)

    if self._loss_fn is not None:
        loss_val = self._loss_fn(pred, *labels)
    else:
        loss_val = None

    return loss_val, pred, labels
mindocr.utils.model_wrapper.NetWithLossWrapper

Bases: nn.Cell

A universal wrapper for any network with any loss.

PARAMETER DESCRIPTION
net

network

TYPE: nn.Cell

loss_fn

loss function

input_indices

The indices of the data tuples which will be fed into the network. If it is None, then the first item will be fed only.

DEFAULT: None

label_indices

The indices of the data tuples which will be fed into the loss function. If it is None, then the remaining items will be fed.

DEFAULT: None

Source code in mindocr\utils\model_wrapper.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class NetWithLossWrapper(nn.Cell):
    """
    A universal wrapper for any network with any loss.

    Args:
        net (nn.Cell): network
        loss_fn: loss function
        input_indices: The indices of the data tuples which will be fed into the network.
            If it is None, then the first item will be fed only.
        label_indices: The indices of the data tuples which will be fed into the loss function.
            If it is None, then the remaining items will be fed.
    """

    def __init__(self, net, loss_fn, pred_cast_fp32=False, input_indices=None, label_indices=None):
        super().__init__(auto_prefix=False)
        self._net = net
        self._loss_fn = loss_fn
        # TODO: get this automatically from net and loss func
        self.input_indices = input_indices
        self.label_indices = label_indices
        self.pred_cast_fp32 = pred_cast_fp32
        self.cast = ops.Cast()

    def construct(self, *args):
        """
        Args:
            args (Tuple): contains network inputs, labels (given by data loader)
        Returns:
            loss_val (Tensor): loss value
        """
        if self.input_indices is None:
            pred = self._net(args[0])
        else:
            pred = self._net(*select_inputs_by_indices(args, self.input_indices))

        if self.pred_cast_fp32:
            if isinstance(pred, list) or isinstance(pred, tuple):
                pred = [self.cast(p, mstype.float32) for p in pred]
            else:
                pred = self.cast(pred, mstype.float32)

        if self.label_indices is None:
            loss_val = self._loss_fn(pred, *args[1:])
        else:
            loss_val = self._loss_fn(pred, *select_inputs_by_indices(args, self.label_indices))

        return loss_val
mindocr.utils.model_wrapper.NetWithLossWrapper.construct(*args)
PARAMETER DESCRIPTION
args

contains network inputs, labels (given by data loader)

TYPE: Tuple DEFAULT: ()

RETURNS DESCRIPTION
loss_val

loss value

TYPE: Tensor

Source code in mindocr\utils\model_wrapper.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def construct(self, *args):
    """
    Args:
        args (Tuple): contains network inputs, labels (given by data loader)
    Returns:
        loss_val (Tensor): loss value
    """
    if self.input_indices is None:
        pred = self._net(args[0])
    else:
        pred = self._net(*select_inputs_by_indices(args, self.input_indices))

    if self.pred_cast_fp32:
        if isinstance(pred, list) or isinstance(pred, tuple):
            pred = [self.cast(p, mstype.float32) for p in pred]
        else:
            pred = self.cast(pred, mstype.float32)

    if self.label_indices is None:
        loss_val = self._loss_fn(pred, *args[1:])
    else:
        loss_val = self._loss_fn(pred, *select_inputs_by_indices(args, self.label_indices))

    return loss_val
mindocr.utils.recorder
mindocr.utils.recorder.PerfRecorder

Bases: object

Source code in mindocr\utils\recorder.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
class PerfRecorder(object):
    def __init__(
        self,
        save_dir,
        metric_names: List = ["loss", "precision", "recall", "hmean", "s/epoch"],
        file_name="result.log",
        separator="\t",
        resume=False,
    ):
        self.save_dir = save_dir
        self.sep = separator
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
            print(f"{save_dir} not exist. Created.")

        self.log_txt_fp = os.path.join(save_dir, file_name)
        if not resume:
            result_log = separator.join(["Epoch"] + metric_names)
            with open(self.log_txt_fp, "w", encoding="utf-8") as fp:
                fp.write(result_log + "\n")

    def add(self, epoch, *measures):
        """
        measures (Tuple): measurement values corresponding to the metric names
        """
        sep = self.sep
        line = f"{epoch}{sep}"
        for i, m in enumerate(measures):
            if isinstance(m, ms.Tensor):
                m = m.asnumpy()

            if isinstance(m, float) or isinstance(m, np.float32):
                line += f"{m:.4f}"
            elif m is None:
                line += "NA"
            else:
                line += f"{m}"

            if i < len(measures) - 1:
                line += f"{sep}"
        # line += f"{epoch_time:.2f}\n"

        with open(self.log_txt_fp, "a", encoding="utf-8") as fp:
            fp.write(line + "\n")

    def save_curves(self):
        plot_result(self.log_txt_fp, save_fig=True, sep=self.sep)
mindocr.utils.recorder.PerfRecorder.add(epoch, *measures)

measures (Tuple): measurement values corresponding to the metric names

Source code in mindocr\utils\recorder.py
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def add(self, epoch, *measures):
    """
    measures (Tuple): measurement values corresponding to the metric names
    """
    sep = self.sep
    line = f"{epoch}{sep}"
    for i, m in enumerate(measures):
        if isinstance(m, ms.Tensor):
            m = m.asnumpy()

        if isinstance(m, float) or isinstance(m, np.float32):
            line += f"{m:.4f}"
        elif m is None:
            line += "NA"
        else:
            line += f"{m}"

        if i < len(measures) - 1:
            line += f"{sep}"
    # line += f"{epoch_time:.2f}\n"

    with open(self.log_txt_fp, "a", encoding="utf-8") as fp:
        fp.write(line + "\n")
mindocr.utils.seed

random seed

mindocr.utils.seed.set_seed(seed=42)

Note: to ensure model init stability, rank_id is removed from seed.

Source code in mindocr\utils\seed.py
 9
10
11
12
13
14
15
16
17
18
19
def set_seed(seed=42):
    """
    seed: seed int

    Note: to ensure model init stability, rank_id is removed from seed.
    """
    # if rank is None:
    #    rank = 0
    random.seed(seed)
    ms.set_seed(seed)
    np.random.seed(seed)
mindocr.utils.train_step_wrapper

Train step wrapper supporting setting drop overflow update, ema etc

mindocr.utils.train_step_wrapper.TrainOneStepWrapper

Bases: nn.TrainOneStepWithLossScaleCell

TrainStep with ema and clip grad.

PARAMETER DESCRIPTION
drop_overflow_update

if True, network will not be updated when gradient is overflow.

DEFAULT: True

scale_sense

If this value is a Cell, it will be called to update loss scale. If this value is a Tensor, the loss scale can be modified by set_sense_scale, the shape should be :math:() or :math:(1,).

TYPE: Union[Tensor, Cell] DEFAULT: 1.0

RETURNS DESCRIPTION

Tuple of 3 Tensor, the loss, overflow flag and current loss scale value.

loss (Tensor) - A scalar, the loss value.

overflow (Tensor) - A scalar, whether overflow occur or not, the type is bool.

loss scale (Tensor) - The loss scale value, the shape is :math:() or :math:(1,).

Source code in mindocr\utils\train_step_wrapper.py
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
class TrainOneStepWrapper(nn.TrainOneStepWithLossScaleCell):
    """TrainStep with ema and clip grad.

    Args:
        drop_overflow_update: if True, network will not be updated when gradient is overflow.
        scale_sense (Union[Tensor, Cell]): If this value is a Cell, it will be called
            to update loss scale. If this value is a Tensor, the loss scale can be modified by `set_sense_scale`,
            the shape should be :math:`()` or :math:`(1,)`.

    Returns:
        Tuple of 3 Tensor, the loss, overflow flag and current loss scale value.
        loss (Tensor) -  A scalar, the loss value.
        overflow (Tensor) -  A scalar, whether overflow occur or not, the type is bool.
        loss scale (Tensor) -  The loss scale value, the shape is :math:`()` or :math:`(1,)`.

    """

    def __init__(
        self,
        network,
        optimizer,
        scale_sense=1.0,
        ema=None,
        updates=0,
        drop_overflow_update=True,
        gradient_accumulation_steps=1,
        clip_grad=False,
        clip_norm=1.0,
        verbose=False,
    ):
        super().__init__(network, optimizer, scale_sense)
        self.ema = ema
        self.drop_overflow_update = drop_overflow_update

        assert isinstance(clip_grad, bool), f"Invalid type of clip_grad, got {type(clip_grad)}, expected bool"
        assert clip_norm > 0.0 and isinstance(clip_norm, float), f"clip_norm must be float > 1.0, but got {clip_norm}"
        self.clip_grad = clip_grad
        self.clip_norm = clip_norm

        assert gradient_accumulation_steps >= 1
        self.grad_accu_steps = gradient_accumulation_steps
        if gradient_accumulation_steps > 1:
            # additionally caches network trainable parameters. overhead caused.
            # TODO: try to store it in CPU memory instead of GPU/NPU memory.
            self.accumulated_grads = optimizer.parameters.clone(prefix="grad_accumulated_", init="zeros")
            self.zeros = optimizer.parameters.clone(prefix="zeros_", init="zeros")
            self.cur_accu_step = Parameter(Tensor(0, ms.int32), "grad_accumulate_step_", requires_grad=False)
            self.zero = Tensor(0, ms.int32)
            for p in self.accumulated_grads:
                p.requires_grad = False
            for z in self.zeros:
                z.requires_grad = False

        self.verbose = verbose
        self.is_cpu_device = context.get_context("device_target") == "CPU"  # to support CPU in CI

        self.map = ops.Map()
        self.partial = ops.Partial()

    def construct(self, *inputs):
        # compute loss
        weights = self.weights
        loss = self.network(*inputs)  # mini-batch loss
        scaling_sens = self.scale_sense

        # check loss overflow
        if not self.is_cpu_device:
            status, scaling_sens = self.start_overflow_check(loss, scaling_sens)
        else:
            status = None

        scaling_sens_filled = C.ones_like(loss) * F.cast(scaling_sens, F.dtype(loss))  # loss scale value

        # 1. compute gradients (of the up-scaled loss w.r.t. the model weights)
        grads = self.grad(self.network, weights)(*inputs, scaling_sens_filled)

        # 2. down-scale gradients by loss_scale. grads = grads / scaling_sense  / grad_accu_steps
        # also divide gradients by accumulation steps to avoid taking mean of  the accumulated gradients later
        grads = self.hyper_map(F.partial(_grad_scale, scaling_sens * self.grad_accu_steps), grads)

        # 3. check gradient overflow
        if not self.is_cpu_device:
            cond = self.get_overflow_status(status, grads)
            overflow = self.process_loss_scale(cond)
        else:
            overflow = ms.Tensor(False)
            cond = ms.Tensor(False)

        # accumulate gradients and update model weights if no overflow or allow to update even when overflow
        if (not self.drop_overflow_update) or (not overflow):
            # 4. gradient accumulation if enabled
            if self.grad_accu_steps > 1:
                # self.accumulated_grads += grads
                loss = F.depend(loss, self.map(self.partial(ops.assign_add), self.accumulated_grads, grads))
                # self.cur_accu_step += 1
                loss = F.depend(loss, ops.assign_add(self.cur_accu_step, Tensor(1, ms.int32)))

                if self.cur_accu_step % self.grad_accu_steps == 0:
                    # 5. gradient reduction on distributed GPUs/NPUs
                    grads = self.grad_reducer(self.accumulated_grads)

                    # 6. clip grad
                    if self.clip_grad:
                        grads = ops.clip_by_global_norm(grads, self.clip_norm)
                    # 7. optimize
                    loss = F.depend(loss, self.optimizer(grads))

                    # clear gradient accumulation states
                    loss = F.depend(
                        loss, self.map(self.partial(ops.assign), self.accumulated_grads, self.zeros)
                    )  # self.accumulated_grads = 0
                    loss = F.depend(loss, ops.assign(self.cur_accu_step, self.zero))  # self.cur_accu_step = 0
                else:
                    # update LR in each gradient step but not optimize net parameter to ensure the LR curve is
                    # consistent
                    loss = F.depend(loss, self.optimizer.get_lr())  # .get_lr() will make lr step increased by 1
            else:
                # 5. gradient reduction on distributed GPUs/NPUs
                grads = self.grad_reducer(grads)
                # 6. clip grad
                if self.clip_grad:
                    grads = ops.clip_by_global_norm(grads, self.clip_norm)
                # 7. optimize
                loss = F.depend(loss, self.optimizer(grads))

            # 8.ema
            if self.ema is not None:
                self.ema.ema_update()
        else:
            # print("WARNING: Gradient overflow! update skipped.")
            pass

        return loss, cond, scaling_sens

mindocr.version

version init